Databricks for Custom DB¶
Databricks on different platforms such as Azure and Google Cloud Platform (GCP) are supported as Custom DB data source type in Alation. It is recommended to use the Simba Spark JDBC Driver for Databricks on Azure and Databricks on GCP. For more details on the driver version, refer to the appropriate version of the Support Matrix. The Simba Spark JDBC Driver is available by default in Alation.
To set up the connection for Databricks on Azure or GCP, or , see the corresponding sections dedicated to each of the sources:
Databricks Objects to Alation Objects Mapping¶
MDE is always performed against a concrete cluster, which must be running at the time.
The Hive Metastore is required to perform MDE
Alation does not usually catalog code other than SQL.
A collection of tables
Table / DataFrame
Alation only knows about DataFrames which have been registered as Tables in the Hive Metastore
Complex data types may not be fully supported
Spark SQL Query
Starting with 2020.3, QLI is supported for Spark SQL queries. Requires configuration.
Functions / procedures
An account specific for each user so that Compose can be used with the correct privileges, and QLI can be attributed to the right user.
An account specific for each user, so that Compose can be used with the correct privileges, and QLI can be attributed to the right user
Logs to collect/review:
For logs related to MDE: taskserver.log, taskserver_err.log.
For logs related to Compose: connector.log, connector_err.log.
For any other errors: alation-error.log, alation-debug.log