Azure Blob Storage OCF Connector: Install and Configure¶
The metadata extraction uses Azure Storage Blob inventory reports for the Azure Storage containers. Alation extracts the inventory reports in the destination container and streams the metadata to the catalog.
Prerequisites¶
Required Information¶
The following information is required for configuring the Azure Blob Storage file system source in Alation:
Destination storage container to store the Blob Inventory for Alation—This destination storage container is created to store the blob inventory for performing the metadata extraction.
Access to the destination storage container. Two authentication methods are supported:
Access Key—Access key is configured for a storage account providing full access to the account. Refer to Use Account Access Key for more information.
Name of the storage account hosting source storage containers
Set the inventory rule: Inventory rule
Endpoint suffix: Check Endpoints in Azure
Configure Inventory Reports¶
Configure Azure Blob Inventory for storage containers to be cataloged. Refer to Enable Azure Storage Blob Inventory Reports for details.
Choose the destination storage container in the Container field.
Select Blob for Object Type to Inventory.
Select the following inventory fields:
Name
Content-Length
Last-Modified
HDI folder status—This is applicable only for ADLS Gen2 storage
Select CSV for Export Format
Note
After the inventory rule creation, the inventory report becomes available after a time period up to 48 hours.
After the inventory reports of all containers that you want ingested in Alation have been delivered to your destination container, you can proceed with the configuration on the Alation side.
Configuration in Alation¶
Step 1: Install the Connector¶
Alation On-Prem¶
Important
Installation of OCF connectors requires Alation Connector Manager to be installed as a prerequisite.
To install an OCF connector:
If this has not been done on your instance, install the Alation Connector Manager: Install Alation Connector Manager.
Ensure that the OCF connector Zip file that you received from Alation is available on your local machine.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.
Alation Cloud Service¶
Note
On Alation Service Cloud instances, Alation Connector Manager is available by default.
Ensure that the OCF connector Zip file that you received from Alation is available on your local machine.
Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.
Step 2: Create and Configure a New Azure Blob Storage OCF File System Source¶
Log in to the Alation instance and add a new Azure Blob Storage by clicking on Apps > Sources > Add > File System.
From the File System Type dropdown, select Azure Blob Storage OCF Connector.
Provide a Title for the file system and click on Add File System. You will be navigated to the Settings page of your new file system source.
Access¶
On the Access tab, set the file system visibility as follows:
Public File System—The file system source will be visible to all users of the catalog.
Private File System—The file system source will be visible to users allowed access by File System Admins.
Add new File System Admin users in the File System Admins section.
General Settings¶
Perform the configuration on the General Settings tab:
Specify Connector Settings:
Parameter
Description
File System Connection
Storage Account Name
Specify the name of the storage account.
Use Shared Access Signature (SAS)
Select Use Shared Access Signature checkbox to authenticate using shared access signature (SAS).
Access Key/Shared Access Signature
Specify the access key or shared access signature if the Use Shared Access Signature checkbox is selected.
Storage Endpoint Suffix
Specify the storage endpoint suffix.
Logging Information
Log Level
Select the log level to generate logs. The available log levels are based on the Log4j framework.
Click Save.
Under Test Connection, click Test to validate network connectivity.
Deleting the Data Source¶
You can delete your data source from the General Settings tab. Under Delete Data Source, click Delete to delete it.
Metadata Extraction¶
You can perform a full extraction or incremental extraction with the help of additional configuration on the Azure Storage side.
Connector Settings¶
Metadata Extraction Configuration¶
Specify the metadata extraction settings and click Save:
Parameter |
Description |
---|---|
Destination Container |
Specify the name of the destination container that hosts the inventory reports. Note that the wait time is 24 to 48 hours for the first inventory report to be generated after the inventory function is set. If you run MDE before the inventory report generation, then Alation will not extract any data. |
Inventory Rule Name |
Specify the inventory rule name. |
Schema Extraction Configuration¶
A schema extraction job will extract the schema (columns/headers) for each CSV, TSV, PSV, and Parquet files. To perform this task, a full extraction must be performed in advance.
For CSV files, select a delimiter as applicable in the CSV File Delimiter field. For TSV and PSV files, tab and pipe will be used as a delimiter respectively; however, users cannot change the delimiter for TSV and PSV files.
Specify the schema extraction settings under the Schema extraction configuration section and click Save:
Parameter |
Description |
---|---|
CSV File Delimiter |
From this dropdown, select the CSV file delimiter within all the CSV files in the
file system source. The default delimiter value is |
Use Schema Path Pattern |
Select the Use Schema Path Pattern checkbox to identify the folders or file sets as schemas. This is an optional feature that provides optimization for schema extraction jobs if users have a logical schema present in their storage and all the files belonging to the logical schema represent the same schema. Refer to Schema Path Pattern for more information. |
Schema Path Pattern |
Specify the schema path pattern. |
Selective Extraction¶
On the Metadata Extraction tab, you can select the containers to include or exclude from extraction. Enable the Selective Extraction toggle if you want only a subset of containers to be extracted.
To extract only select containers:
Click Get List of containers to first fetch the list of containers. The status of this action will be logged in the Extraction Job Status table at the bottom of the Metadata Extraction tab.
When containers synchronization is complete, a drop-down list with the available containers will become enabled.
Select one or more containers as required.
Check if you are using the correct filter option. Available filter options are described below:
Filter Option
Description
Extract all containers except
Extract metadata from all containers except the selected containers.
Extract only these containers
Extract metadata only from the selected containers.
Click Run Extraction Now to extract metadata. The status of the extraction action is also logged in the Job History table at the bottom of the page.
Extraction Scheduler¶
If you wish to automatically update the metadata extracted into the catalog, under Automated and Manual Extraction, turn on the Enable Automated Extraction switch and select the day and time when metadata must be extracted. The metadata extraction will be automatically scheduled to run on the selected schedule.
Sampling¶
Note
Sampling is available Azure Blob Storage OCF Connector version 2.0.0 onwards and Alation version 2023.1 and later.
File sampling is supported for CSV, TSV, PSV, Parquet file formats and resource set folder for older files. File sampling is supported with Basic Authentication (Access Key and Shared Access Signature) and Azure OAuth Authentication.
Basic Authentication for Sampling¶
To perform sampling:
Go to the catalog page of the extracted supported file format Samples tab and click Credential Settings.
Click Select > Add New button.
Select Azure Basic Authentication from the Authentication Type dropdown. Provide Access Key or Shared Access Signature if the Use Shared Access Signature checkbox is selected. Click Save.
Click Authenticate and you will be redirected to the catalog page.
Click the Run Sample button on the catalog page to see sampled data.
Azure OAuth Authentication for Sampling¶
Prerequisites¶
Create and configure the application in IdP, refer to Create an Authentication Application for Alation in the IdP.
Configure the IdP in Azure Active Directory, refer to Create an Identity Provider in Azure AD.
Configure the auth plug-in in Alation, refer to OAuth.
Perform Sampling¶
To perform sampling:
Go to the catalog page of the extracted supported file format Samples tab and click Credential Settings.
Click Select > Add New button.
Select Azure OAuth Authentication from the Authentication Type dropdown. Provide Credential Name and choose the relevant Plugin Config Name. Click Save.
Click Authenticate and you will be redirected to the IdP login page.
Login to IdP with your IdP credential.
Click the Run Sample button on the catalog page to see sampled data.
Limitations¶
Incremental extraction is not supported.
Due to Azure storage platform limitations, object owner is not supported.
Last modification time is not supported for Azure Blob Storage folders.
Troubleshooting¶
Test Connection Issues¶
Make sure the Azure Storage account access key or SAS is correct.
Make sure you have provided the correct endpoint suffix.
No Inventory Reports Found¶
Make sure that the destination container is correct.
If the destination container is correct, then wait for 48 hours after setting up inventory for inventory report generation and try again.
MDE or Filter Extraction Fails Due to Access Issue¶
Make sure the user account has access to destination container.
More troubleshooting recommendations can be found in Troubleshooting.