Configure Cross-Source Lineage

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Alation can build and display lineage that traces data movement across catalog sources. Such lineage is known as cross-source lineage. Proper configuration of cross-source lineage allows Alation to accurately identify objects related through lineage and prevents the creation of temporary nodes on lineage charts.

For example, if you have cataloged an RDBMS database as a data source and a BI server that uses data from this database as a BI source, lineage charts for objects under these sources can be configured to show lineage across both of them. As a result, you’ll be able to trace BI dashboards, reports, and report fields to the relevant RDBMS tables and columns in the upstream lineage. If you are configuring cross-source lineage between data sources and BI sources, proceed to Configure the BI Connection Info/Additional Datasource Connections Field.

Another example is lineage between data sources. You can catalog several databases from the same data warehouse as different data sources. Tables and views in these databases may be related through lineage, and configuring cross-source lineage will surface those lineage paths on lineage charts. Cross-source lineage between data sources requires additional enablement in alation_conf. If you are configuring for this use case, start with Enable Cross-Source Lineage Between Data Sources.

Enable Cross-Source Lineage Between Data Sources

Available from version 2023.1.6

To enable cross-source lineage between data sources, set the alation_conf parameter alation.feature_flags.DEV_enable_cross_system_lineage to True. No restart is required. For help with alation_conf, see Using alation_conf.

Note

Alation Cloud Service customers can request server configuration changes through Alation Support.

After you’ve enabled the feature, proceed to Configure the BI Connection Info/Additional Datasource Connections Field.

Configure the BI Connection Info/Additional Datasource Connections Field

To configure cross-source lineage, you need to correctly fill out the BI Connection Info or Additional datasource connections field on the General Settings tab on the settings page of a data source. This field is used by the lineage service logic to identify lineage objects that are extracted into different sources.

Note

For some connectors, the BI Connection Info field has been renamed to Additional datasource connections in newer versions. Despite the name change, it remains the same field and serves the same purpose.

Depending on your use case, the BI Connection Info or Additional datasource connections field can store the BI data source connection information or the connection information of RDBMS data sources:

Lineage Between a BI Source and a Data Source

To configure cross-source lineage between a BI source and a data source:

  1. Check that you have cataloged the specific BI source and have extracted its metadata. This should be the BI source that uses the data from the data source you want establish lineage to.

  2. Find the BI data source connection string for the data source as it appears in the BI source metadata. There are two ways to locate this information. Both ways will lead you to the BI datasource connection catalog object that stores the required database connection string. The screenshot below shows an example of a BI datasource connection catalog page and the location of the connection string you need.

    ../../_images/OCF_BIConnectionInfo_Example.png

    2.1 Open the catalog page of the BI source.

    2.2 On the BI source page, click the DataSources tab or the DataSet tab, depending on which tab name you find under your BI source. This tab lists all BI data source connections from all BI objects under this BI source. Depending on how much BI metadata you have in the catalog, the list can be quite long. If you know the name of the connection you are looking for, use the Filter on top right of the data sources or datasets list to find the object.

    ../../_images/OCF_BIConnectionInfo_UseFilter.png

    2.3 After locating the BI data source or dataset, in the Name column, click its name. The corresponding catalog page will open.

    2.4 Click the Connections tab. This tab lists all data source connections under a BI data source object.

    2.5 On the right, under the Properties section, locate the field Database Connection. This field will contain the connection string that you need.

    2.1 Open the catalog page of the BI source.

    2.2 Drill down to the BI object that uses the connection you’re looking for. For each BI source type, the hierarchy of BI objects will differ. Start with a site or folder, and click through down to the level that stores BI data source connections (such as project, workbook, report, or dashboard).

    2.3 On the BI object page, click the DataSources or DataSet tab.

    ../../_images/OCF_BIConnectionInfo_Object_Page.png

    2.4 The DataSources or DataSet tab will display the list of BI data sources used to create the BI report(s).

    ../../_images/OCF_BIConnectionInfo_DataSourcesList.png

    2.5 Click the name of the data source you need. The corresponding catalog page will open.

    2.6 Click the Connections tab. This tab lists all connections under a BI data source.

    2.7 On the right, under the Properties section, locate the field Database Connection. This field will contain the connection string that you need.

  3. Copy the database connection information from the BI data source connection page and paste into the BI Connection Info or Additional datasource connections field under the General Settings > Application Settings section.

    Note

    In certain situations, the BI data source connection value may only contain the host information but no port value. However, a port is required as part of the BI Connection Info or Additional datasource connections field value.

    If this is your case, add a random integer as a port number for the value to be valid. The value -1 is also acceptable. For example:

    • toro-demo.ceadil4offf2.us-west-2.rds.amazonaws.com:123

    • toro-demo.ceadil4offf2.us-west-2.rds.amazonaws.com:-1

  4. Save the field.

Lineage Between Data Sources

To configure cross-source lineage between data sources:

  1. Provide the host and port information of the data source in the BI Connection Info or Additional datasource connections field. For example, if the URI of the relevant data source is snowflake://dbhost.com:443/?warehouse=TEST, then the information you need for the field is dbhost.com:443. Do this for all data sources that may have objects connected through lineage.

    Note

    Cross-source lineage between data sources has a dedicated alation_conf parameter that is disabled by default. When this parameter is set to False, you will see temporary lineage nodes on lineage charts even if you configure the BI Connection Info or Additional datasource connections field correctly. Learn more in Enable Cross-Source Lineage Between Data Sources.

  2. Run metadata extraction on all data sources you’ve configured cross-source lineage for.

  3. Run query log ingestion.

As a result of configuring the BI Connection Info or Additional datasource connections field and after running metadata extraction and query log ingestion, you should see lineage paths between data sources. If you previously saw them as temporary (TMP), you should see the TMP badge removed. If you still see the TMP badge, refer to Troubleshooting Cross-Source Lineage Between Data Sources for troubleshooting advice.

View Cross-Source Lineage

As a result of configuring the BI Connection Info or Additional datasource connections field and after running metadata extraction and query log ingestion, you will see lineage links between the relevant data source and the BI source that uses it as a source of data.

Note

The schema that was used in the BI data source or dataset must be cataloged in Alation. Excluding the required schema from extraction will create a lineage object with the TMP tag (“temporary object”) on the Lineage chart.

The screenshot below shows lineage for a BI report that includes the upstream lineage links to the corresponding data source, where:

  • 1—The BI object catalog page

  • 2—This BI object represented on the Lineage diagram under the Lineage tab

  • 3—The upstream lineage that shows the source of data and comes from the data source in the catalog where the BI Connection Info field stores the corresponding BI dataset connection information

    ../../_images/OCF_BIConnectionInfo_Chart.png

Troubleshoot Cross-Source Lineage

Troubleshoot Cross-Source Lineage Between BI Sources and Data Sources

Problem

Tables and views in the upstream lineage for a BI source have the temporary (TMP) badge although the BI Connection Info or Additional datasource connections field on the relevant data source has been configured correctly.

Solution

The schemas which were used to create the BI data source or dataset must be cataloged in Alation under the data source. If a source object was excluded from extraction, the lineage generation process will create a lineage node with the TMP badgeon lineage charts. Check that the data objects, such as tables and views, represented on lineage charts as TMP nodes have been cataloged in Alation. If not, extract them from the data source and then rerun extraction on the BI source.

Troubleshooting Cross-Source Lineage Between Data Sources

Temporary Lineage Nodes Appear on Lineage Charts

Problem

The BI Connection Info or Additional datasource connections field has been configured correctly, but temporary (TMP) lineage nodes still appear on lineage charts for table and view objects.

Solution

Cross-source lineage between data sources has a dedicated alation_conf feature flag that is disabled by default. You will see temporary lineage nodes on lineage charts even if you configure the BI Connection Info or Additional datasource connections field correctly. Ensure that the feature flag is set to True in your catalog instance. After enabling the flag, rerun metadata extraction and query log ingestion.

On how to enable the feature, see Enable Cross-Source Lineage Between Data Sources.

Temporary Lineage Nodes Appear for View Objects

Problem

The cross-source lineage feature flag is enabled, the data BI Connection Info or Additional datasource connections field has been configured correctly, but the temporary (TMP) badges still appear on lineage charts of view objects. The TMP badges do not appear on lineage nodes of table objects.

Solution

Re-parse the lineage information for views for each data source. This action regenerates lineage for views using the configuration of the BI Connection Info or Additional datasource connections field.

Note

Re-parsing is done on the backend of the Alation server. Alation Cloud Service admins can request assistance from Alation Support.

To re-parse the lineage for views:

  1. Find out ID’s of all data sources that you’ve configured cross-source lineage for. Learn more in How to Find Data Source ID. Have the ID’s at the ready.

  2. Use SSH to connect to the Alation server.

  3. Enter the Alation shell using the following command:

    sudo /etc/init.d/alation shell
    
  4. Switch the user to alation.

    sudo su alation
    
  5. Run the following code, substituting the placeholder <comma_separated_data_source_ids> with comma-separated data source ID’s.

    python -m reparse_view_sql --ds_ids <comma_separated_data_source_ids>
    

    Example:

    python -m reparse_view_sql --ds_ids 1,2
    
  6. Exit from the alation user and Alation shell by using the exit command twice.

After you reparse the views, the TMP badges should be removed from view objects on lineage charts.