Creating Lineage Data Manually

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Applies from version 2021.4

Users with the Catalog, Source, and Server Admin roles can create lineage data manually if the Manual Lineage Curation feature is enabled in the Alation Catalog. See Manual Lineage Curation for details about activating this functionality.

Admin users creating manual lineage do not have to be assigned as Data Source Admins to the data source to be able to create lineage for this source. They can create lineage on the Catalog pages of all data objects they are allowed to access in the Catalog:

  • Public data sources

  • Private data sources they are granted access to

  • BI sources they have access to.

Both table-level and column-level lineage can be created manually.

Opening the Lineage Editor

To open the manual lineage editor:

  1. Log in to Alation as an admin user.

  2. Open the catalog page of the data object for which you want to create lineage, for example a Table or a BI Report.

  3. Click the Lineage tab to open it. If lineage data already exists, you will see the Edit button on the top right of the diagram:

    ../../_images/lineageV3_03.png

    If no lineage data exists, there will be the Manually Create Lineage button:

    ../../_images/lineageV3_02.png
  4. To begin creating lineage either click Edit or Manually Create Lineage. This will open the lineage data editor:

    ../../_images/lineageV3_04.png

You can close the lineage editor anytime by clicking Cancel and then Close on top right. In this case, your unpublished changes will not be saved. Currently, the only way to save your work in the lineage editor is to publish the lineage data you created. There is no ability to save manually created lineage for later or any other way to preserve your work.

Creating Table-Level Lineage Manually

To create table-level lineage:

  1. In the Lineage editor, click on the data object for which you want to create lineage. This will add two plus icons, upstream and downstream from the object:

    ../../_images/lineageV3_05.png
  2. Click on any of the plus icons, depending on what kind of lineage links you want to create, upstream or downstream. The icon will be highlighted and a dataflow object editor will open on the right. Note that the object for which you are creating lineage will be automatically added to either Sources or Targets depending on which plus icon you clicked:

    ../../_images/lineageV3_06.png
  3. In the dataflow editor, specify a Title instead of the placeholder text.

  4. You can add values to the Steward field of your dataflow object. This adds values to the Steward field on the Dataflow catalog page. The Source Admin creating the dataflow object is the first Steward who is added automatically. To add other users or groups as Stewards, click the plus icon to the right of the Stewards field and in the Quick Search dialog, find and click a user or a group to add them:

    ../../_images/lineageV3_07.png
  1. To remove a Steward, hover over a name in the Stewards list and then click the cross icon to the right of the name.

Next, add paths and the lineage details.

Adding Paths

The Paths tab of the dataflow properties is selected by default. Under Paths, you can add one or multiple lineage paths from Source to Target data objects.

A path is a lineage link from the upstream Source object to the Dataflow object or from the dataflow object to the downstream Target object. Each path requires at least one Source data object and at least one Target data object.

Adding Sources

To add a Source object, click on Add Sources and use the Quick Search dialog to find the source object in the Catalog:

../../_images/lineageV3_08.png

Note the changes on the Lineage diagram after you have added the Source object: the editor will display the link from the source object to the dataflow object:

../../_images/lineageV3_09.png

Adding an External Source

You also have the ability to add Source objects that are not represented in the Alation Catalog, or external objects. This provides Source Admins with the ability to insert objects into lineage without having to create a virtual data source to represent those external objects. To add a Source object that is an external object, click Add External Object and specify an ID for the external object. The ID field is used to identify the external object in Alation and must be unique. Do not leave the ID unspecified and do not leave the default value of zero.

After providing the ID, click Add External Object:

../../_images/Lineage_AddExternalObject.png

External sources will display the EXT label on the diagram:

../../_images/Lineage_SourceWithExtLabel.png

Adding Targets

To add a Target object, click Add Targets and use the Quick Search dialog to search for the Catalog object to add:

../../_images/lineageV3_12.png

You can add several Targets if multiple targets are affected by the current dataflow object.

Adding an External Target

You also have the ability to add target objects that do not exist in the Alation Catalog, or external objects. This provides Source Admins with the ability to insert objects into lineage without having to create a virtual data source to represent those external objects. To add a Target object that is an external object, click Add External Object under Targets and specify an ID for the external object. The ID field is used to identify the external object in Alation and must be unique. The value can be a reference to external objects such as external folders or files. Do not leave the ID unspecified and do not leave the default value of zero.

After providing the ID, click Add External Object. The external Target object will have the EXT label on the diagram:

../../_images/Lineage_ExternalTarget.png

Removing Source and Target Objects

To remove a Source or Target object from lineage data, click the cross icon for this object:

../../_images/lineageV3_14.png

Highlighting Paths

You can highlight a path by clicking the Highlight icon to the right of the Path number:

../../_images/lineageV3_15.png

This will highlight this specific path in the working area of the editor:

../../_images/lineageV3_16.png

Adding Multiple Paths

To add another path, click Add Path at the bottom of the Paths tab. This will add another path, Path 2, to the currently edited dataflow object:

../../_images/lineageV3_17.png

Add Source and Target objects to the new path using the Add Sources and Add Targets buttons.

Deleting a Path

To delete a path, click the trash can icon to the right of the Path number. This will remove the path from the dataflow editor and from the working area.

../../_images/lineageV3_18.png

Adding Lineage Details

On the Details tab of the Dataflow object editor, you can add a Description of the lineage data you are adding and the SQL for lineage paths. Click the Details tab of the editor to open it:

../../_images/lineageV3_19.png

To add a description, click Edit for the Description field, add a description and save.

To add the SQL code, click Edit for the Dataflow content field, provide the SQL code and save.

You also use the Details tab to assign a group source for the dataflow object. See Grouping Dataflow Paths.

Staging Table-Level Lineage for Publishing

After creating paths and adding details, click Done on the bottom left of the dataflow object editor. This closes the editor and stages your changes for publishing: the Publish Lineage button on top right of the working area becomes active.

Publishing Table-Level Lineage

Changes are not visible to other Catalog users until they are published.

Publish the lineage data to save your work and to make the lineage data visible to all catalog users.

Important

If you click Cancel at this point or refresh or close the browser window, you will lose your work. Publishing is the only way to save the lineage data you created manually.

To publish the lineage data you created, click Publish Lineage and confirm this action in the confirmation dialog that pops up:

../../_images/lineageV3_20.png

This will add your manually created lineage to the Lineage diagram.

Creating Column-Level Lineage

Available from version 2021.4.5

Column-level lineage can be created using subpaths within each primary path. To create column-level lineage:

  1. In the Lineage editor, click on the dataflow for which you want to create column-level lineage. This will open the property editor on the right.

  2. In the editor, locate the primary path where you want to add column-level lineage. For this path, click Add Subpath at the bottom of the path section.

    ../../_images/lineageV3_24.png

    This will add a subpath editor to the primary path.

    ../../_images/lineageV3_25.png
  3. For every subpath you create, you will need to add Sources and Targets. A source is the source column that is used to create another column. A target is the column created from the source through the dataflow object. To add a source column, click the plus button under Sources and search for and select a source column.

    ../../_images/lineageV3_26.png
  4. When you click on the column in the quick search, it will be added to your lineage diagram. Note that the newly created lineage links are shown with dashed lines.

    ../../_images/lineageV3_27.png
  5. Next, add a target. Click the plus button under Targets to begin adding the target column.

  6. In the Quick Search field, search for the target column and click its name to add it to the subpath. The column will be added as a target column and will appear on the diagram under the target table:

    ../../_images/lineageV3_28.png
  7. Repeat this process if you want to show lineage for multiple columns. Begin with creating a new subpath under the primary path and add sources and targets to the subpaths.

  8. After completing all column-level lineage links, click Done on bottom right to stage your lineage data for publishing.

  9. Click Publish Lineage on top right to publish your changes and make them available to other users. You will see the following message:

    ../../_images/lineageV3_29.png

    This is expected. Take note of the Job ID. It will be required if you encounter problems publishing the lineage you just created. Click Okay to close the message.

    Note

    When you click Publish Lineage, all changes to the lineage diagram that you made in the editor are sent for processing. If processing succeeds, you will see the updated lineage graph. However, if for some reason processing fails, then all the changes will be lost and the details will be available in the job status. The job status can be retrieved using the Jobs API and the Job ID from the Publish Lineage message. See How to use the Job ID below for more details.

  10. Click Cancel on the upper right or use the back button of your browser window to go back to the Catalog page. This will close the editor. The lineage links you created should be displayed on the lineage diagram.

    Note

    If you do not see your changes on the diagram, wait for about 30 seconds, go back to the Overview tab and refresh the page. Then click on the Lineage tab to view the updated graph.

Creating Multiple Subpaths vs. Creating Multiple Sources and Targets Within One Subpath

If you want to show column-level lineage for multiple source columns of the source table that are linked to multiple target columns of the target table, you can either use multiple subpaths under the same primary path or create multiple source and target columns within one subpath. This depends on what kind of lineage you want to add. The user interface of the lineage editor provides enough flexibility to support various cases of lineage.

For example, if you want to show that multiple source columns are used for creating one target column, you should create multiple source columns and a single target within the same subpath. Note that when you publish your lineage and click on a specific object, the full subpath will be highlighted, and you will see multiple highlighted sources.

The screenshot below shows a target column that has multiple source columns. In this configuration, there are multiple source columns and a single target column in the lineage subpath.

../../_images/lineageV3_30.png

If you want to show that each source column in the source table has one corresponding target column in the target table, you should create multiple subpaths within the primary path that connects the source and target tables. The following screenshot shows a subpath that has a single source and a single target:

../../_images/lineageV3_31.png

How to Use the Job ID

When you publish new lineage data, Alation creates a job on the server that processes it and writes the changes to the internal lineage database. This job has an ID that is exposed in the message that pops up when you click the Publish Lineage data. You can use this Job ID with the Jobs API to retrieve the status of this publishing job.

If the lineage you are trying to publish does not appear on the lineage diagram, Alation recommends checking the job status using the Jobs API. Delays in publishing lineage data may be caused by other processes running on the system, such as metadata extraction or query log ingestion jobs.

The lineage job is also logged in Admin Settings > Monitor > Completed Tasks. A completed job will appear on the Completed Tasks tab. The name of the lineage publishing job is push_manual_lineages_from_file.

../../_images/lineageV3_32.png

Differentiating Automatic and Manual Lineage

On a Lineage diagram that has both automatically generated lineage data and manually added lineage paths, you can differentiate between the type of lineage data by looking at the dataflow object icons:

../../_images/lineageV3_21.png
  • Automatically generated lineage or lineage added using the Lineage V2 API:

    ../../_images/lineageV3_22.png
  • Manually added lineage:

    ../../_images/lineageV3_23.png

When you edit a Lineage diagram that already has automatically generated lineage or lineage added using the Lineage V2 API, you won’t be able to edit these generated paths. From version 2022.3, however, you can add or edit details for automatically generated lineage.

Grouping Dataflow Paths

You can assign specific dataflow paths to group sources that you define. You can then filter your paths using those sources.

To assign a dataflow path to a group, you can use the Source field in the Properties section on the Details tab of the lineage editor or the dataflow object’s catalog page. Click the pencil icon to assign or edit the group source; you are given a choice to Select from existing sources or Create new source.

../../_images/DF_Edit_Source.png

If no sources have been defined, the first option is unavailable. To select from existing sources, choose from one of the already-defined sources:

../../_images/DF_EditSource_ExistingSources.png

To create a new source, type a source name in the provided field, then select the green check mark to the right of the field:

../../_images/DF_EditSource_NewSource.png

If you are making your changes in the lineage editor, click Done and then Publish Lineage to publish your changes.