Databricks Unity Catalog OCF Connector: Overview¶
The OCF Connector for Databricks Unity Catalog was developed by Alation and is available as a Zip file that can be uploaded and installed in the Alation application.
The latest Databricks Unity Catalog OCF connector package can be downloaded from the Connector Hub on the Alation Customer Portal. Ask an Alation admin with access to Customer Portal to download the connector from the Connectors section (Customer Portal > Connectors).
The connector should be used to catalog Databricks workspaces that have Unity Catalog enabled. It supports both interactive clusters and SQL endpoints for metadata extraction. The connector can catalog metadata objects from multiple workspaces using a single data source connection. Extracted schemas will be referenced with multipart names (catalog.schema
).
The connector supports Databricks on AWS and Azure Databricks.
Connector Version Compatibility¶
Newer versions of the connector offer more features and may require newer Alation releases.
Connector version |
Compatible with Alation from version |
New features in this version |
---|---|---|
2.0.3.6564 |
2023.1.7.1 |
Disabled automatic lineage generation from queries ingested during query log ingestion (QLI) for this connector. The Data Source admins no longer need to manually disable automatic lineage on the data source settings page. Lineage is calculated based on direct lineage extraction from system tables. |
2.0.2.6259 |
2023.1.7.1 |
Implemented query log ingestion (beta). |
1.2.1.5335 |
2023.1.4 |
Implemented Query Service. |
1.1.0.4393 |
2023.1.4 |
Added support for incremental extraction of direct lineage (beta). |
1.0.3.4144 |
2023.1.2 |
Implemented direct lineage extraction (beta). |
Note
Direct lineage extraction and query log ingestion are considered beta as they rely on the Unity Catalog system tables that are currently in Public Preview in Databricks and require separate access enablement. You may need assistance from your Databricks account admin to enable access to these features in Databricks.
Team¶
You may require assistance from your Databricks account admin when configuring this connector in Alation.
Databricks administrator:
Creates a service account for Alation and grants it the required permissions to access metadata
Generates a personal access token
Provides the JDBC URI to access metadata
Assists in enabling the Public Preview features (system lineage and audit tables)
Alation Server Admin:
Installs the connector
Creates and configures a Databricks Unity Catalog OCF data source in Alation
Scope¶
The table below shows which metadata objects are extracted by this connector and which operations are supported.
Feature |
Scope |
Availability |
---|---|---|
Authentication |
||
Token-based |
Authentication with a personal access token |
Yes |
SSO authentication |
SSO authentication with an identity provider application |
No |
Metadata extraction (MDE) |
||
Default MDE |
Extraction of metadata based on default extraction queries in the connector code |
Yes |
Custom query-based MDE |
Extraction of metadata based on extraction queries provided by a user |
No |
Extracted metadata objects |
||
Data source |
Data source object in Alation that is parent to extracted metadata |
Yes |
Schemas |
List of schemas, with multipart schema names
|
Yes |
Tables |
List of tables |
Yes |
Columns |
List of columns |
Yes |
Column data types |
Column data types |
Yes |
Views |
List of views |
Yes |
Source comments |
Source comments |
Yes |
Primary keys |
Primary key information for extracted tables |
No |
Foreign keys |
Foreign key information for extracted tables |
No |
Functions |
Extraction of function metadata |
No |
Sampling and profiling |
||
Table sampling |
Extracts data samples from all extracted tables |
Yes |
Column sampling |
Extracts data samples from all extracted columns |
Yes |
Deep column profiling |
On-demand profiling of specific columns with the calculation of value distribution stats |
Yes |
Dynamic profiling |
On-demand table and column profiling by individual users who use their own database accounts to retrieve the profiles |
Yes |
Custom query-based table sampling |
Ability to use custom queries for sampling specific tables |
Yes |
Custom query-based column sampling |
Ability to use custom queries for profiling specific columns |
Yes |
Query log ingestion (QLI) (beta) |
||
Extraction and ingestion of query history (Available from connector version 2.0.3.6564 and and Alation version 2023.1.7.1) |
Extraction of query history from the system audit table and ingestion of query history metadata into the catalog |
Yes |
Query history, filters, expressions, joins, and popularity |
Query history, filters, joins, and popularity information is calculated from the query history metadata extracted and ingested with QLI |
Yes |
Lineage extraction (beta) |
||
Extraction of lineage information (Available from connector version 1.0.3.4144 and Alation version 2023.1.2) |
Lineage information is calculated during metadata extraction (direct lineage extraction). Additionally, lineage is generated based on DDL queries run in Compose. Users can also create lineage manually or add it using the public API. |
Yes |
Compose |
Yes |
|
Data upload |
Yes |