Hive 2 on EMR with QLI over S3¶
Hive EMR differs from other Hive configurations in that the query logs are typically captured on S3.
Configuring EMR Hive in Alation¶
Follow the instructions for setting up a Hive connection in Alation for Metadata Extraction and Profiling/Sampling.
Query Log Ingestion Setup for EMR Hive¶
Choose AWS S3 as the connection type. This displays the field relevant to S3 connection setup. Access ID and Secret Key are mandatory. If the Region is not specified, it will assume the value of us-east-1. Region name should be as listed in Amazon API Gateway Table.
For S3, the log path format is /bucketname/path/to/logdirectory/ or /bucketname/path/to/logfile.gzip.
EMR archives the query logs and stores them in S3. Alation assumes that the files are archived. For ephemeral (transient) clusters, it is recommended to specify the master log path. Alation will traverse the tree with master log path as root and find the logs.
If the actual log paths are:
User can specify the master log path as: /path/