Impala on CDH OCF Connector: Overview

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

The OCF connector for Impala on CDH was developed by Alation and is available on demand as a Zip file that can be uploaded and installed in Manage Connectors.

To download the Impala on CDH OCF connector package, go to the Alation Connector Hub available from the Customer Portal. Go to Customer Portal > Connectors > Alation Connector Hub. Only Alation users with access to the Customer Portal can access the Alation Connector Hub. If you don’t have access to the Customer Portal, contact Alation Support.

This connector should be used to catalog Impala on CDH as a data source in Alation. The connector catalogs Impala objects such as schemas, tables, views, and columns. It enables users to search and find the Impala metadata from the Alation user interface.

Team

The following administrators are required to install this connector:

  • Alation administrator:

    • Installs the connector.

    • Creates and configures an OCF Impala data source in the catalog.

  • Impala administrator:

    • Creates a service account for Alation and grants it the required privileges.

    • Provides the authentication information and Kerberos configuration files (krb5.conf and keytab) to Alation Server Admin.

    • Provides the SSL certificate.

    • Provides the JDBC URI.

    • Provides access to the metastore server to extract metadata.

    • Assists in configuring query log ingestion (QLI).

  • HDFS administrator:

    • Provides access to the query log history directory on HDFS.

Scope

The table below describes which metadata objects are extracted by the connector and which catalog functionality is supported.

Feature

Scope

Availability

Authentication

Basic authentication

Authentication with a service account created on the database that uses a username and password

Yes

LDAP

Authentication with a database service account that is an LDAP account in an organization’s network

Yes

SSL

Database connection over SSL

Yes

Kerberos

Support for Kerberos authentication

Yes

Keytab

Support for Kerberos with keytabs

Yes

SSO

SSO authentication with an identity provider application

No

Metadata Extraction (MDE)

Default MDE

Extraction of metadata based on CDH API

Yes

Custom query-based MDE

Extraction of metadata based on extraction queries provided by a user

No

Extracted metadata objects

Data source

Data source object in Alation that is parent to extracted metadata

Yes

Schemas

List of schemas

Yes

Tables

List of tables

Extraction of Kudu tables is supported from connector version 1.0.4.4751 (requires Alation version 2023.1 or newer)

Yes

Columns

List of columns

Yes

Column data types

Column data types

Yes

Views

List of views

Yes

Source comments

Source comments

Yes

Primary keys

Primary key information for extracted tables

No

Foreign keys

Foreign key information for extracted tables

No

Functions

Function metadata

No

Function definitions

Function definition metadata

No

Sampling and Profiling

Table sampling

Retrieval of data samples from extracted tables

Yes

Column sampling

Retrieval of data samples from extracted columns

Yes

Deep column profiling

Profiling of columns with the calculation of value distribution stats

Yes

Dynamic profiling

Ability for individual users to connect with their own database accounts to retrieve table and column samples and profiles

Yes

Custom query-based table sampling

Ability to use custom queries for sampling specific tables

Yes

Custom query-based column profiling

Ability to use custom queries for profiling specific columns

Yes

Query Log Ingestion (QLI)

API-based QLI

Ingestion of query history based on a table or view that contains query history data

Yes

Query-based QLI

Ingestion of query history based on a custom query history extraction query

No

JOINs and filters

Calculation of JOIN and filter information based on ingested query history

Yes

Predicates

Ability to parse predicates in ingested queries

Yes

Lineage

Automatic lineage generation

Auto-calculation of lineage based on query history ingested from QLI, MDE, and Compose queries

Yes

Column-level lineage

Calculation of lineage at column level

No

Compose

Customer-managed (on-prem) Alation instances

Compose on on-prem Alation instances

Yes

Alation Cloud Service instances, connection without Agent

Compose for data sources connected without Alation Agent

No

Alation Cloud Service instances, connection via Agent

Compose for data sources connected through Alation Agent

No

Basic authentication in Compose

Authentication in Compose with username and password

Yes

Kerberos authentication in Compose

Authentication in Compose through Kerberos

No

SSL authentication in Compose

Authentication in Compose through Kerberos

No

SSO authentication in Compose

Authentication in Compose with SSO credentials

No

Limitations

  • Compose is supported with basic authentication only. Kerberos authentication in Compose is not supported.

  • SSL authentication is not supported in Compose.

  • Sampling and profiling is not available for complex data types.

  • Data upload into columns with complex data types is not available yet.

Driver Information

The connector is packaged with the Hive 2 driver for Impala—com.alation.drivers.hive-driver-2.1.1-java-9-patched 2.1.1. The driver is bundled with HiveShims and HadoopShims APIs. This driver works only for Compose connections and executing queries. MDE and QLI use the Impala APIs.