Azure Synapse OCF Connector: Install and Configure¶
Network Configuration¶
Ports¶
Open outbound TCP port 1433 to Azure Synapse Analytics (default).
Azure Synapse Analytics can listen on non-standard ports. If you connect Alation to an Azure Synapse Analytics instance, then open the appropriate outbound TCP ports from the Alation server.
Connectivity¶
You may need to whitelist the Alation IP address on your Azure portal if the connectivity test from Alation is unsuccessful.
Service Account¶
For SQL authentication, create a service account for Alation on the master database and grant it the required permissions.
Example:
CREATE LOGIN [login_value] WITH PASSWORD = [password]; CREATE USER [service_account] FOR LOGIN [login_value];
If using an Azure AD account, grant the required permissions to the AD account.
Permissions for Metadata Extraction, Sampling, and Profiling¶
Grant the service account read access on all databases that you want extracted into the catalog. Read access is also enough for sampling and profiling.
Dedicated SQL Pool¶
EXEC sp_addrolemember 'db_datareader', '<alation_service_account>'Note
Refer to Azure documentation for details on database-level roles, for example: Database-level roles.
Serverless SQL Pool¶
ALTER ROLE db_datareader ADD member '<alation_service_account>';
For sampling and profiling of external tables and views on top of external tables, the service account or—in case of dynamic profiling— individual users require access to the files under the storage account. For more information about access to storage accounts, refer to Control storage account access for serverless SQL pool.
On both dedicated and serverless pools, additionally grant the service account the VIEW DEFINITION
permissions for Alation to extract view definitions for views:
GRANT VIEW DEFINITION TO <alation_service_account>;
Permissions for Query Log Ingestion¶
For query log ingestion, the service account requires access to the sys.fn_get_audit_file
file. The configuration details are available in Query Log Ingestion below.
Dedicated SQL Pool¶
For QLI from a dedicated SQL pool, the service account needs the CONTROL
permission to access the audit file.
GRANT CONTROL TO <alation_service_account>;
Serverless SQL Pool¶
For QLI from a serverless SQL pool, the service account needs the CONTROL SERVER
permission to access the audit file.
GRANT CONTROL SERVER TO <alation_service_account>;
JDBC URI¶
Format¶
For both SQL authentication and Azure AD username and password authentication, use the following format:
sqlserver://<hostname_or_ip>:<port>
Authentication with an Azure AD account does not require any additional parameters in the URI.
Note
In Azure Portal, the JDBC URL for a resource under Azure Synapse can be found in the resource properties. Alation only requires the hostname and port information and, optionally, the database name. The full URL should not be used.
With the JDBC URI in this format, Alation will extract schemas from all data resources under Synapse Analytics that the service account has been granted access to. You can optionally include the database
parameter if you want to extract metadata from one specific data resource only.
Examples¶
Dedicated SQL Pool¶
sqlserver://test99443.sql.azuresynapse.net:1433sqlserver://test99443.sql.azuresynapse.net:1433;database=test99443
Serverless SQL Pool¶
sqlserver://synaptestal102416adity22-ondemand.sql.azuresynapse.net:1433sqlserver://synaptestal102416-ondemand.sql.azuresynapse.net:1433;database=test1
Configuration in Alation¶
STEP 1: Install the Connector¶
Alation On-Prem¶
Important
Installation of OCF connectors requires Alation Connector Manager to be installed as a prerequisite.
To install an OCF connector:
If this has not been done on your instance, install the Alation Connector Manager: Install Alation Connector Manager.
Ensure that the OCF connector Zip file that you received from Alation is available on your local machine.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.
Alation Cloud Service¶
Note
On Alation Service Cloud instances, Alation Connector Manager is available by default.
Ensure that the OCF connector Zip file that you received from Alation is available on your local machine.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.
STEP 2: Create and Configure a New Data Source¶
In Alation, add a new data source:
Log in to Alation as a Server Admin.
Expand the Apps menu on the right of the main toolbar and select Sources.
On the Sources page, click +Add on the top right of the page and in the list that opens, click Data Source. This will open the Add a Data Source wizard.
On the first screen of the wizard, specify a name for your data source, assign additional Data Source Admins, if necessary, and click the Continue Setup button on the bottom. The Add a Data Source screen will open.
On the Add a Data Source screen, the only field you should populate is Database Type. From the Database Type dropdown, select the connector name. After that you will be navigated to the Settings page of your new data source.
The name of this connector is Synapse OCF Connector.
Access¶
On the Access tab, set the data source visibility using these options:
Public Data Source—The data source will be visible to all users of the catalog.
Private Data Source—The data source will be visible to the users allowed access to the data source by Data Source Admins.
You can add new Data Source Admin users in the Data Source Admins section.
General Settings¶
Configure the connection on the General Settings tab.
Application Settings¶
Specify Application Settings if applicable. Click Save to save the changes after providing the information.
Parameter |
Description |
---|---|
BI Connection Info |
This parameter is used to generate lineage between a BI source and this data source. Specify the host and the port used by BI connections that pull data from this data source. Use the following format: You can provide multiple values as a comma-separated list:
The host and port details can also be found under Properties > Database connections on the BI Datasource page > Connections tab. |
Disable Automatic Lineage Generation |
Select this checkbox to disable automatic lineage generation from QLI, MDE, and Compose queries. By default, automatic lineage generation is enabled. |
Connector Settings¶
Data Source Connection¶
Populate the data source connection information and save the values by clicking Save in this section.
Parameter |
Description |
---|---|
JDBC URI |
Specify the JDBC URI in the required format. |
Azure Active Directory |
Select this checkbox if you are using an Azure AD account to authenticate on the database. No additional parameters are required in the URI when you select the checkbox. |
Username |
Specify the service account username. |
Password |
Specify the service account password. |
Enable SSL |
Enable or disable SSL authentication by selecting or clearing the Enable SSL checkbox. If the Enable SSL checkbox is enabled, upload the SSL certificate using the upload link below. |
Truststore Password |
Specify the password for the SSL certificate. Note: The password will be deleted if the data source connection is deleted. |
Logging Configuration¶
Select the logging level for the connector logs and save the values by clicking Save in this section. The available log levels are based on the Log4j framework.
Parameter |
Description |
---|---|
Log level |
Select the log level to generate logs. The available options are INFO, DEBUG, WARN, TRACE, ERROR, FATAL, ALL. |
Obfuscate Literals¶
Obfuscate Literals—Enable this toggle to hide the details of the queries in the catalog page that are ingested via QLI or executed in Compose. This toggle is disabled by default.
Test Connection¶
After specifying the connector settings, under Test Connection, click Test to validate network connectivity.
Metadata Extraction¶
You can configure metadata extraction (MDE) for an OCF data source on the Metadata Extraction tab of the Settings page. Refer to Configure Metadata Extraction for OCF Data Sources for information about the available configuration options.
The default queries that the connector uses to extract metadata can be found in Extraction Queries for Azure Synapse Analytics. You can customize these queries to adjust the extraction to your needs.
Incremental Extraction from Compose¶
After creating tables in Compose, users will see the new table information on the respective catalog page without re-running MDE.
Compose users should use fully qualified names when creating tables to ensure the successful incremental extraction, for example:
CREATE TABLE [DATABASE_NAME].[SCHEMA_NAME].[TABLE_NAME] ({column properties});
Sampling and Profiling¶
Sampling and profiling is supported. For details, see Configure Sampling and Profiling for OCF Data Sources.
Compose¶
To connect to Synapse Analytics in Compose, Compose users must ensure that a valid database name is included in the Compose URI in this format:
sqlserver://<hostname_or_ip_address>:<port_number>;database=<DB_Name>
For details about configuring the Compose tab of the Settings, refer to Configure Compose for OCF Data Sources.
Query Log Ingestion¶
The Azure Synapse Analytics data source supports table-based and custom query-based QLI. Both types of QLI require that auditing should be enabled in Azure.
Important
If using the QLI functionality, ensure that the service account has the CONTROL permission as it needs to access the
sys.fn_get_audit_file
. For details, refer to the corresponding Azure documentation: sys.fn_get_audit_file (Transact-SQL).
Enable Auditing for Azure Synapse¶
To enable auditing in Azure:
Log in to the Azure portal and navigate to your Azure resource.
Click on Auditing under Security.
Switch Enable Azure SQL Auditing to ON.
Select the Storage checkbox.
Select your Subscription.
Select a storage account. The storage account is a container to store the audit files. We recommend creating a new storage account for Alation. Select general purpose v1 as account kind.
Expand Advanced properties.
Set the appropriate retention period. For example, you can change the retention period to
1
. This assumes that Alation ingests audit files daily—ensure that you select the corresponding configuration in the data source settings in Alation.With the retention period set to
1
, Azure will log one day of query data in the storage. When configuring query log ingestion in Alation, set the automatic QLI to run daily to ingest this batch of query history. We recommend a low value for the retention period to avoid filling up the storage account with logs and to reduce the cost for the resource. Keeping this number as low as possible also helps reduce the amount of logs to retrieve into Alation per each query log ingestion job.If the retention period is set to
0
, Azure will store the logs until they get manually deleted (0
is indefinite storage). This is not a recommended setting.
Click on Save. Auditing is now enabled and audited data will be streamed to a binary flat file format (.xel).
Note
The .xel file is stored in the container that was selected in step 6.
Table-Based QLI¶
To go with the table-based QLI option, you will need to create a table or a view on your database to store the query history data.
Use the query format given below to create a view in Azure Synapse Analytics to use for QLI. The view name must be in the following format: <schema>.<view>
.
Note
Substitute the placeholder
<container_URI>
with the container URI you find in your container properties, for example:https://testaccount.blob.core.windows.net/sqldbauditlogs/my_database/CREATE VIEW <schema>.<view> AS SELECT server_principal_name as userName, event_time as startTime, [statement] as queryString, session_id as sessionID, event_time as sessionStartTime, duration_milliseconds as milliseconds, 'N' as cancelled, database_name as defaultDatabases FROM sys.fn_get_audit_file('<container_URI>', default, default) WHERE event_time BETWEEN (STARTTIME) AND (ENDTIME);
In Alation, on the Query Log Ingestion tab of the data source settings, specify the name of the table or view in which the query logs are available in the Table Name field. Make sure that the service account has the permissions to access this table or view.
Custom Query-Based QLI¶
When for some reason you cannot create a table or view on the database, you can use a custom QLI query to perform QLI. In the Custom QLI Query field, provide the expected query structure as shown below and click Save.
SELECT server_principal_name as userName, event_time as startTime, [statement] as queryString, session_id as sessionID, event_time as sessionStartTime, duration_milliseconds as milliseconds, 'N' as cancelled, database_name as defaultDatabases FROM sys.fn_get_audit_file('<container_URI>', default, default) WHERE event_time BETWEEN (STARTTIME) AND (ENDTIME);
Automated and Manual QLI¶
You can either perform QLI manually on demand or enable automated QLI:
To perform manual QLI, under the Automated and Manual Query Log Ingestion section of the Query Log Ingestion tab, ensure that the Enable Automated Query Log Ingestion toggle is disabled.
Click Preview to get a sample of the query history data to be ingested.
Click the Import button to perform QLI on demand.
Note
Metadata extraction must be completed first before running QLI.
To schedule QLI, enable the Enable Automated Query Log Ingestion toggle.
Set a schedule under Automated Query Log Ingestion Time by specifying values in the week, day and time fields. The next QLI job will run on the schedule you have specified.
Note
The hourly schedule for automated QLI is not supported.
Log Location¶
Refer to Troubleshooting for information about logs.