Databricks Unity Catalog OCF Connector: Overview

The OCF Connector for Databricks Unity Catalog was developed by Alation and is available as a Zip file that can be uploaded and installed in the Alation application.

The latest Databricks Unity Catalog OCF connector package can be downloaded from the Connector Hub on the Alation Customer Portal. Ask an Alation admin with access to Customer Portal to download the connector from the Connectors section (Customer Portal > Connectors).

The connector should be used to catalog Databricks workspaces that have Unity Catalog enabled. It supports both interactive clusters and SQL endpoints for metadata extraction. The connector can catalog metadata objects from multiple workspaces using a single data source connection. Extracted schemas will be referenced with multipart names (catalog.schema).

The connector supports Databricks on AWS and Azure Databricks.

Connector Version Compatibility

Newer versions of the connector offer more features and may require newer Alation releases.

Connector version

Compatible with Alation from version

New features in this version

2.0.3.6564

2023.1.7.1

Disabled automatic lineage generation from queries ingested during query log ingestion (QLI) for this connector. The Data Source admins no longer need to manually disable automatic lineage on the data source settings page. Lineage is calculated based on direct lineage extraction from system tables.

2.0.2.6259

2023.1.7.1

Implemented query log ingestion (beta).

1.2.1.5335

2023.1.4

Implemented Query Service.

1.1.0.4393

2023.1.4

Added support for incremental extraction of direct lineage (beta).

1.0.3.4144

2023.1.2

Implemented direct lineage extraction (beta).

Note

Direct lineage extraction and query log ingestion are considered beta as they rely on the Unity Catalog system tables that are currently in Public Preview in Databricks and require separate access enablement. You may need assistance from your Databricks account admin to enable access to these features in Databricks.

Team

You may require assistance from your Databricks account admin when configuring this connector in Alation.

  • Databricks administrator:

    • Creates a service account for Alation and grants it the required permissions to access metadata

    • Generates a personal access token

    • Provides the JDBC URI to access metadata

    • Assists in enabling the Public Preview features (system lineage and audit tables)

  • Alation Server Admin:

    • Installs the connector

    • Creates and configures a Databricks Unity Catalog OCF data source in Alation

Scope

The table below shows which metadata objects are extracted by this connector and which operations are supported.

Feature

Scope

Availability

Authentication

Token-based

Authentication with a personal access token

Yes

SSO authentication

SSO authentication with an identity provider application

No

Metadata extraction (MDE)

Default MDE

Extraction of metadata based on default extraction queries in the connector code

Yes

Custom query-based MDE

Extraction of metadata based on extraction queries provided by a user

No

Extracted metadata objects

Data source

Data source object in Alation that is parent to extracted metadata

Yes

Schemas

List of schemas, with multipart schema names catalog.schema

Yes

Tables

List of tables

Yes

Columns

List of columns

Yes

Column data types

Column data types

Yes

Views

List of views

Yes

Source comments

Source comments

Yes

Primary keys

Primary key information for extracted tables

No

Foreign keys

Foreign key information for extracted tables

No

Functions

Extraction of function metadata

No

Sampling and profiling

Table sampling

Extracts data samples from all extracted tables

Yes

Column sampling

Extracts data samples from all extracted columns

Yes

Deep column profiling

On-demand profiling of specific columns with the calculation of value distribution stats

Yes

Dynamic profiling

On-demand table and column profiling by individual users who use their own database accounts to retrieve the profiles

Yes

Custom query-based table sampling

Ability to use custom queries for sampling specific tables

Yes

Custom query-based column sampling

Ability to use custom queries for profiling specific columns

Yes

Query log ingestion (QLI) (beta)

Extraction and ingestion of query history

(Available from connector version 2.0.3.6564 and and Alation version 2023.1.7.1)

Extraction of query history from the system audit table and ingestion of query history metadata into the catalog

Yes

Query history, filters, expressions, joins, and popularity

Query history, filters, joins, and popularity information is calculated from the query history metadata extracted and ingested with QLI

Yes

Lineage extraction (beta)

Extraction of lineage information

(Available from connector version 1.0.3.4144 and Alation version 2023.1.2)

Lineage information is calculated during metadata extraction (direct lineage extraction). Additionally, lineage is generated based on DDL queries run in Compose.

Users can also create lineage manually or add it using the public API.

Yes

Compose

Yes

Data upload

Yes