Azure Databricks OCF Connector: Overview

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

The OCF connector for Azure Databricks was developed by Alation and is available as a Zip file that can be uploaded and installed in the Alation application. The connector is compiled together with the required database driver, so no additional effort is needed to procure and install the driver.

To download the Azure Databricks OCF connector package, go to the Alation Connector Hub available from the Customer Portal. Go to Customer Portal > Connectors > Alation Connector Hub. Only Alation users with access to the Customer Portal can access the Alation Connector Hub. If you don’t have access to the Customer Portal, contact Alation Support.

This connector should be used to catalog Azure Databricks or Azure Databricks on Azure Government Cloud as a data source on Alation on-premise and Alation Cloud Service instances. It extracts and catalogs such database objects as tables, views, and columns. After the metadata is extracted, it is represented in the data catalog as a hierarchy of catalog pages under the parent data source. Alation users can leverage the full catalog functionality to search for and find the extracted metadata, curate the corresponding catalog pages, create documentation about the data source, and exchange information about it.

Team

The following administrators are required to install this connector:

  • Alation administrator

    • Installs the connector.

    • Creates and configures the Azure Databricks data source in the catalog.

  • Azure Databricks administrator

    • Creates a service account for Alation.

    • Provides the JDBC URI to access metadata.

    • Provides access to schemas and tables to extract metadata.

    • Assists with configuring Query Log Ingestion (QLI).

    • Assists with configuring OAuth authentication for Compose.

Scope

The table below shows which metadata objects are extracted by this connector and which features are supported.

Feature

Scope

Availability

Authentication

Token-based authentication

Authentication using Databricks personal access tokens

Yes

Metadata extraction (MDE)

Default MDE

Extracts metadata based on default extraction queries in the connector code

Yes

Custom query-based MDE

Extracts metadata based on custom extraction queries provided by user

No

Popularity

Indicator of the popularity (intensity of use) of a data object, such as a table or a column

Yes

Extracted metadata objects

Schemas

List of schemas

Yes

Tables

List of tables

Yes

Columns

List of columns

Yes

Column comments

Column comments

Yes

Column data types

Column data types

No

Views

List of views

Yes

Source comments

Source comments

Yes

Primary keys

Primary key information for extracted tables

No

Foreign keys

Foreign key information for extracted tables

No

Functions

Function metadata

No

Function definitions

Function definition metadata

No

Sampling and Profiling

Table sampling

Extracts data samples from extracted tables

Yes

Column sampling

Extracts data samples from extracted columns

Yes

Deep column profiling

Profiling of specific columns with the calculation of value distribution stats

Yes

Dynamic profiling

Table and column profiling by individual users who use their own database accounts to retrieve the profiles

Yes

Custom query-based table sampling

Ability to use custom queries for sampling specific tables

Yes

Custom query-based column sampling

Ability to use custom queries for sampling specific columns

Yes

Query Log Ingestion (QLI)

File-based QLI

Ingestion of query history based on log files that contains query history data

Yes

Table-based QLI

Ingestion of query history based on a table that contains query history data

Yes

Query-based QLI

Ingestion of query history based on a custom query

No

JOINs and filters

Calculation of JOIN and filter information based on ingested query history

Yes

Predicates

Ability to parse predicates in ingested queries

Yes

Lineage

Automatic lineage generation

Auto-calculation of lineage based on query history ingested from QLI, MDE, and Compose queries

Yes

Compose

Customer-managed (on-premise) instances

Compose on on-premise Alation instances

Yes

Alation Cloud Service instances

Depending on your network configuration, you may be using Alation Agent to connect to your data source.

Compose via Agent is supported from connector version 1.1.0.4607.

Yes

Personal Access Token (PAT) authentication in Compose

Authentication in Compose with username and password

Yes

SSO through OAuth in Compose

Authentication in Compose with OAuth via Azure Active Directory

OAuth authentication is supported from connector version 1.0.1.2340.

Yes

Metastore Support

We have certified the AWS Databricks connector with Hive as the metastore. Please note that we do not certify external metastores, such as AWS Glue, Derby.