Azure Databricks OCF Connector: Overview

The OCF connector for Azure Databricks was developed by Alation and is available as a Zip file that can be uploaded and installed in the Alation application. The connector is compiled together with the required database driver, so no additional effort is needed to procure and install the driver.

Create a ticket with Alation Support about receiving the Azure Databricks OCF connector package from Alation.

This connector should be used to catalog Azure Databricks or Azure Databricks on Azure Government Cloud as a data source on Alation on-prem and Alation Cloud Service instances. It extracts and catalogs such database objects as tables, views, and columns. After the metadata is extracted, it is represented in the data catalog as a hierarchy of catalog pages under the parent data source. Alation users can leverage the full catalog functionality to search for and find the extracted metadata, curate the corresponding catalog pages, create documentation about the data source, and exchange information about it.

Team

The following administrators are required to install this connector:

  • Alation administrator

    • Installs the connector

    • Creates and configures the Azure Databricks data source in the catalog.

  • Azure Databricks administrator

    • Creates a service account for Alation

    • Creates compute (cluster) used for Metadata Extraction

    • Provides the JDBC URI to access metadata

    • Provides access to public schemas to extract metadata

    • Provides credentials for Blob Storage or Azure Data Lake Storage Gen 2 (ADLS)

    • Configures a Databricks notebook required for file-based and table-based query log ingestion.

Scope

The table below shows which metadata objects are extracted by this connector and which operations are supported.

Feature

Scope

Availability

Authentication

Token-based authentication

Authentication using Databricks personal access tokens

Yes

SSL authentication

SSL Authentication

No

Kerberos

Authentication with Kerberos

No

Keytab

Authentication with Keytab

No

LDAP

Authentication with the LDAP protocol

No

SSO

Authentication with SSO

No

Metadata Extraction (MDE)

Default MDE

Extracts supported metadata objects based on Databricks JDBC driver methods in the connector code

Yes

Custom query-based MDE

Extracts supported metadata objects based on extraction queries provided by user

No

Extracted metadata objects

Data Source

Data source object in Alation that is parent to the extracted metadata

Yes

Schemas

List of schemas

Yes

Tables

List of tables

Yes

Columns

List of columns

Yes

Column comments

column comments

Yes

Column data types

Column data types

No

Views

List of views

Yes

Source comments

Source comments

Yes

Primary keys

Primary key information for extracted tables

No

Foreign keys

Foreign key information for extracted tables

No

Functions

Extract function metadata

No

Function definitions

Extract function definition metadata

No

Sampling and Profiling

Table sampling

Extracts data samples from all extracted tables

Yes

Column sampling

Extracts data samples from all extracted columns

Yes

Deep column profiling

On-demand profiling of specific columns with the calculation of value distribution stats

Yes

Dynamic profiling

On-demand table and column profiling by individual users who use their own database accounts to retrieve the profiles

Yes

Custom query-based table sampling

Ability to use custom queries for sampling specific tables

Yes

Custom query-based column sampling

Ability to use custom queries for profiling specific columns

Yes

Query Log Ingestion (QLI)

File-based QLI

Ingestion of query history based on a file that contains query history data

Yes

Table-based QLI

Ingestion of query history based on a table that contains query history data

Yes

Query-based QLI

Ingestion of query history based on a custom query history extraction query

No

JOINs and filters

Calculation of JOIN and filter information based on ingested query history

Yes

Predicates

Ability to parse predicates in ingested queries

Yes

Lineage

Automatic lineage generation

Auto-calculation of lineage based on query history ingested from QLI, MDE, and Compose queries

Yes

Compose

On-prem Alation instances

Yes

Alation Cloud Service instances (connection goes through Alation Agent)

No

Personal Access Token (PAT) authentication in Compose

Authentication in Compose with username and password

Yes

OAuth in Compose

Authentication in compose with OAuth Credentials

No

SSO authentication in Compose

Authentication in compose with SSO Credentials

No