GCP Databricks¶
Applies from version 2021.3
Scope of Support¶
Automatic MDE and Profiling, file-based QLI, Popularity, Lineage, Compose
Delta tables are supported. Alation has the ability to extract Delta tables and supports Metadata Extraction, Profiling, and Compose. Users can write queries against Delta tables in Compose. Delta tables are represented as Table objects in the Alation Catalog.
Partitioned tables are supported
Query Log Ingestion is not supported.
Preliminaries¶
The following information and configuration is required to configure an GCP Databricks connection in Alation:
JDBC URI¶
Include the following components into the below mentioned URI:
Hostname
Databricks HTTP Path Prefix
Databricks Cluster ID
User Agent Entry - Alation recommends setting the user agent attribute as a parameter in JDBC URL because it will help you to identify the requests performed by Alation. This is useful for debugging problems caused by Alation or for logging purposes.
JDBC URI Pattern:
spark://<hostname>:443/default;transportMode=http;ssl=1;httpPath=<databricks_http_path_prefix>/<databricks_cluster_id>;AuthMech=3;UserAgentEntry=Alation/2021.3;
Example:
spark://4724910916864653.3.gcp.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/5678080404529670/1129-091234-pooh138;AuthMech=3;UserAgentEntry=Alation/2021.3;
Service Account¶
A service account with privileges to access the Databricks cluster for Metadata Extraction and Profiling. Alation supports the following type of authentication:
Token-based authentication: For information on how to generate a unique token and use it for authentication, see this article.
Metadata Extraction¶
The service account requires access to the Databricks cluster. It must be able to access the tables and metadata to perform the extraction process.
Profiling/Sampling¶
The service account requires access to the Databricks cluster to perform Profiling/Sampling.
Steps in Alation¶
STEP 1: Add a New Datasource¶
Add a new Datasource on the Sources page.
Step 2: Set up the Connection¶
On the Add a Data Source screen of the wizard, specify:
Database Type: Custom DB
JDBC URI: URI in the required format. See JDBC URI.
Example:
spark://4724910916864653.3.gcp.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/5678080404529670/1129-091234-pooh138;AuthMech=3;UserAgentEntry=Alation/2021.3;Select Driver: select the Simba Spark JDBC driver for GCP Databricks from the Select Driver drop-down list. Refer to the appropriate version of Support Matrix for the driver version.
Click Save and Continue. The next wizard screen - Set Up a Service Account - will open.
Note
Do not select the Kerberos ‘Use Kerberos’ checkbox.
![]()
Step 3: Enter Service Account Credentials¶
Enter the token information in the Username and Password fields as follows:
Username: type the word token.
Password: enter the token string.
Step 4: Configure Your Data Source¶
Click Skip this Step. QLI for this type of data source is set up on the Settings > Query Log Ingestion tab and requires additional configuration on the Databricks side (see below).
After this step you are navigated to the Settings page of your data source.
Metadata Extraction¶
Configure and perform metadata extraction and verify the results:
In Settings > Custom Settings, set the Catalog Object Definition to Schema.Table:
![]()
In Settings > Metadata Extraction, set up and perform MDE. Refer to Metadata Extraction.
For GCP Databricks, Alation supports automatic MDE, manual or scheduled. Custom query-based MDE is not supported for this type of data source.
Profiling¶
Configure and perform Sampling and Profiling :
Users can run a sample for an individual table on the Samples tab of the Table Catalog page or profile an individual column on the Overview tab of the Column page.
Automatic full and selective Profiling is supported.
Use the Per-Object Parameters in Settings tab to specify which objects to profile.
Note
Make sure that the Skip Views checkbox of the respective schemas is unchecked to perform the Profiling.
Query Log Ingestion¶
Not supported.
Compose¶
Log into Compose:
Authenticate in Compose with your GCP Databricks credentials.
Use the Schema.Table format for writing queries.
Note
OAuth for Compose is not supported for Databricks on GCP.