Setup Prerequisites

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Prepare for configuring the Databricks Unity Catalog OCF connector by going over these recommendations:

Pre-checks

On your Databricks instance, ensure the following:

  • Unity Catalog is enabled.

  • The workspaces have been assigned to the Unity Catalog metastore. There is a running Unity-compatible interactive cluster or SQL warehouse that Alation will connect to and extract the metadata.

  • For lineage extraction (beta) and query log ingestion (beta), the system schema system.access is enabled.

Configure Network Connectivity

Open inbound TCP port 443 to the Databricks Unity Catalog server.

Create a Service Account

When configuring the connector in Alation, you will need to provide authentication details. To obtain those:

  • Create a Databricks account-level user to be used as a service account in Alation.

  • Assign Alation’s user the USAGE and SELECT permissions on all catalog, schema, and table objects that you want to catalog in Alation.

  • Lineage extraction requires additional permissions. See Permissions for Lineage Extraction below.

  • Query log ingestion (QLI) requires additional permissions. See Permissions for QLI below.

  • Assign Alation’s user to workspace(s) using the information in Manage users, service principals, and groups. It must be assigned to the same workspace(s) as the cluster or SQL warehouse.

  • Assign Alation’s user the Can Attach To permission on the cluster or the Can Use permission on the SQL warehouse.

    • Optionally, you can grant Alation’s user the Can Restart permission if auto-starting the cluster is allowed by your organization’s policy.

  • If you want to use authentication with a personal access token, grant Alation’s user the Can Use permission on personal access tokens.

Permissions for Lineage Extraction

The Unity Catalog lineage feature is currently in Public Preview in Databricks and may require separate access enablement:

Contact your Databricks administrator about enabling access to this feature. Lineage extraction in Alation uses this functionality and is currently a beta feature.

Lineage extraction requires access to the system catalog, the system.access schema, and the tables in this schema. Grant Alation’s user these permissions:

  • USE CATALOG on catalog system

  • USE SCHEMA on schema system.access

  • SELECT on table system.access.table_lineage

  • SELECT on table system.access.column_lineage

The service account does not require USE or SELECT for all catalogs, schemas, and tables captured in the lineage records in the system.access lineage tables. All lineage will be extracted. Any objects that are not cataloged but exist in the system.access tables will be marked as temporary (TMP) on lineage diagrams unless temporary objects have been disabled.

Permissions for QLI

The Unity Catalog audit log feature is currently in Public Preview in Databricks and may require separate access enablement:

Contact your Databricks administrator about enabling access to this feature. Query log ingestion (QLI) in Alation uses this functionality and is currently a beta feature.

QLI requires access to the system catalog, the system.access schema, and the system.access.audit table in this schema. Grant the Alation service account these permissions:

  • USE CATALOG on catalog system

  • USE SCHEMA on schema system.access

  • SELECT on table system.access.audit

Gather the Authentication Details

The connector supports Basic Authentication for Databricks on AWS and Token-Based Authentication for Databricks on AWS and Azure Databricks. Choose the authentication type that suits your use case.

Basic Authentication

Basic authentication requires the username and password of the user you created for Alation in Databricks (see Create a Service Account).

Token-Based Authentication

Token-based authentication requires a personal access token (PAT) of the user you created for Alation in Databricks. Follow the steps in Databricks personal access token for workspace users in Databricks documentation to generate a PAT.

Build the JDBC URI

The JDBC URI string you will need to provide in Alation depends on the connector version:

On how to get the JDBC URI for your Databricks resource, refer to Databricks documentation:

When specifying the JDBC URI in Alation, remove the jdbc: prefix.

Note

The property UseNativeQuery=0 is required for custom query-based sampling and profiling. Without this property in the JDBC URI, custom query-based sampling or profiling will fail. If you are not using custom query-based sampling and profiling in your implementation of this data source type, you can omit this property from the JDBC URI string.

Find more information in ANSI SQL-92 query support in JDBC in Azure Databricks documentation.

Connection String for Databricks JDBC Driver

Find more information in Databricks JDBC driver in Databricks documentation.

Format

databricks://<hostname>:443/default;httpPath=<databricks_http_path_prefix>/<databricks_cluster_id>;UseNativeQuery=0;

Examples

Compute Cluster

databricks://dbc-32ak8401-ac16.cloud.databricks.com:443/default;httpPath=sql/protocolv1/o/2479012801311837/0612-093241-z79vbfjk;UseNativeQuery=0;

SQL Warehouse

databricks://dbc-32am8401-ac16.cloud.databricks.com:443/default;httpPath=/sql/1.0/warehouses/9f5d50hhsaeb0k23;UseNativeQuery=0;

Connection String for Spark JDBC Driver

This format applies to connector versions prior to version 2.0.0. Find more information about the legacy Spark driver in JDBC Spark driver in Databricks documentation.

Note

Connector versions 2.0.0 and newer include the Databricks JDBC driver and use the format described in Databricks JDBC driver.

Format

spark://<hostname>:443/default;httpPath=<databricks_http_path_prefix>/<databricks_cluster_id>;

Example

spark://adb-58175503737864.5.azuredatabricks.net:443/default;httpPath=/sql/1.0/endpoints/0f38f55be5cbd786;

JDBC URI Properties

The connector adds some JDBC properties to the connection. These properties do not need to be explicitly included into the JDBC URI connection string in Alation.

  • RowsFetchedPerBlock—Limits the number of objects returned in each fetch call. Used to regulate the amount of memory used by the connector and prevent OOM errors. Set to 500. The memory utilization of the MDE job is captured in the connector logs when debug logging is enabled.

  • UserAgentEntry—Identifies driver request calls from the connector in Databricks. Set to alation+unity_catalog.

Enable Extraction of Complex Data Types

Complex data types, such as map, array, and struct are extracted. By default, they will be represented as flat lists.

You can enable tree-structure-like representation of complex data types using alation_conf on the Alation server.

Note

Alation Cloud Service customers can request server configuration changes through Alation Support.

To enable the representation of complex data types as a tree structure:

  • On your Alation instance, set the alation_conf parameter alation.feature_flags.enable_generic_nosql_support to True.

  • Additionally, use the parameter alation.feature_flags.docstore_tree_table_depth to define the depth of the display. By default, three levels are displayed.

For information about using alation_conf, refer to Using alation_conf.

Important

After changing values of these parameters, restart Alation Supervisor from the Alation shell: alation_supervisor restart all.

You can find more information about Alation actions in Alation Actions.