Configure Metadata Extraction for OCF Data Sources¶
On the Metadata Extraction tab of an OCF data source Settings page, you can perform metadata extraction (MDE) based on default extraction queries or configure extraction to use custom queries.
Default extraction—MDE uses default SQL queries that are built in the connector code.
Query-based extraction—MDE uses custom SQL queries provided by a Data Source Admin. See Query Based Extraction below.
Under Application Settings, you can enable or disable the Raw Metadata Dump or Replay feature.
Enable Raw Metadata Dump or Replay—The options in this drop list can be used to dump the extracted metadata into files in order to debug extraction issues before ingesting metadata into Alation. This feature can be used during testing to intercept issues with MDE. It breaks extraction into two steps:
first, the extracted metadata can be dumped into files and viewed
second, the metadata from the files can be ingested into Alation.
It is recommended to enable this feature only for MDE debugging purposes. The use of this feature requires access to the backend of the Alation server.
Off—Default. Disables Raw Metadata Dump or Replay. Extracted metadata is ingested into Alation.
Enable Raw Metadata Dump—Select this option to save the extracted metadata into a folder. The metadata dump will be saved in four files (
table.dump) in the folder opt/alation/site/tmp/ inside Alation shell.
Enable Ingestion Replay—Select this option to ingest the metadata from the files into Alation.
Under Connector Settings, you can configure query-based extraction, select schemas to extract or exclude, and set an extraction schedule. If you do not specify any custom extraction queries, Alation will perform metadata extraction based on the default queries.
Query Based Extraction¶
Query-based extraction allows users to customize metadata extraction down to the level of specific metadata types, such as tables, columns, views, and other by using custom queries. The extraction of table, column, view, and function definition metadata is always enabled and cannot be disabled. You can enable or disable the extraction of additional metadata types, such as:
The list of additional metadata types depends on the database type and varies for different data sources.
Extraction of system schema information is disabled by default. All other supported metadata types are enabled by default. You can disable or enable the metadata types you want to extract by clearing or selecting the corresponding checkboxes.
Metadata Extraction Queries¶
To use query-based metadata extraction, you will need to provide custom queries. Alation expects that the queries conform to a specific format and use some reserved identifiers. Use query examples provided in the documentation for a specific connector. After providing custom queries, save them by clicking Save in this section.
Under Selective Extraction, you can select the schemas to include into or exclude from extraction.
To configure selective extraction:
Enable the Selective Extraction toggle if you want to extract a subset of schemas.
Click Get List of Schemas to first fetch a list of schemas. The status of the Get Schemas action will be logged in the Extraction Job Status table on the bottom of the Metadata Extraction tab.
When schema synchronization is complete, a drop-down list of the schemas will become enabled. Select one or more schemas as necessary.
Check if you are using the desired filter option. Available filter options are described below.
Extract all Schemas except—Extract metadata from all schemas except the selected schemas.
Extract only these Schemas—Extract metadata only from the selected schemas.
Click Run Extraction Now under Automated and Manual Extraction to extract metadata. The status of the extraction action is also logged in the Extraction Job Status table at the bottom of the page.
Automated and Manual Extraction¶
If you wish to automatically update the metadata extracted into the catalog, under Automated and Manual Extraction, turn on the Enable Automated Extraction switch and select the recurrence period and day and time for automated MDE. A metadata extraction job for your data source will be scheduled to run on the specified schedule.