Dataflow Objects

A dataflow object is a type of Alation object that represents the data flow between source and target data and documents the process of creating the source-target connections. A dataflow object is generic and can be used to store information about ETL processes, stored procedures, SQL queries, or scripts used to create target data from source data.

Dataflow objects were introduced with Lineage V2 and remain an important part of Lineage V3.

Dataflow objects can be created in three ways:

  • Automatically by Alation when it calculates lineage data from the metadata ingested during metadata extraction (MDE), query log ingestion (QLI), and from ingesting Compose queries. For example, a dataflow object will be created when you run CREATE TABLE or CREATE VIEW queries in Compose to create new data objects.

  • By the Lineage APIs

  • By manual lineage creation.

On the Lineage diagram in Alation, dataflow objects are represented with their own graphic elements. Automatically generated or API-created dataflows use the following icon:

../../_images/Screen_Shot_2019-10-23_at_7.14.30_PM.png ../../_images/DFO1.png

Manually added lineage has the following icon:

../../_images/lineageV3_23.png

Each dataflow object has a dedicated catalog page in Alation.

Viewing Dataflow Objects in the Catalog

To open the page of a dataflow object,

  1. On the Lineage tab, on the diagram itself, click the dataflow object you want to view.

  2. Scroll down to the view area under the diagram. The dataflow object page will be displayed in the view area:

../../_images/DFO2.png

If your role has appropriate permissions, you can change the catalog fields for the dataflow object directly in this view pane or you can open this page in the Catalog by clicking Open on the upper right. You can add the Title and Description and edit other available custom fields.

Note

Users with the Viewer role cannot change any custom fields on catalog pages if the Viewer role is enforced.

Dataflow Fields

Each dataflow object has a number of fields associated with it, including standard ones such as Description, Stewards, Domains, Tags, and Relevant Articles, but also the following ones specific to dataflow objects:

  • Properties

  • Dataflow Content

  • Data Input

  • Data Output

Properties

The Properties field includes the following:

  • Source shows the group source name of the dataflow object, if one has been created. Users with administrator roles can edit this field, and either select an existing group source or create a new one. This source name can then be used to filter the lineage shown.

  • Creation Type has information on how the dataflow object was created:

    • AUTOMATIC

    • API

    • MANUAL

  • External ID displays the external_id property of the dataflow objects created using the API.

Dataflow Content

Dataflow content contains detailed information on what kind of data transformation was performed at this point in the data flow. This field is not editable on the dataflow object’s catalog page, but can be edited in the lineage editor’s Details tab. When a dataflow object is created automatically, the Content field will have the query that created the target object. If the dataflow object is created using the API or manually, the Dataflow Content field value should also be provided.

Input and Output Fields

The Data Input and Data Output fields have the information on the source (Input) and Target (Output) data objects. If there are any external EXT or temp TMP objects connected to the dataflow object, the Data Input and Data Output sections do not show them but they can be viewed in the lineage diagram.

Under Data Input and Data Output, you can expand the objects that have child objects:

../../_images/DF_Input_Output.png

Customizing the Dataflow Page Template

You can customize the catalog page template of the Dataflow object in a similar way to customizing the catalog template of any other object.

You need the role of a Catalog Admin or Server Admin to be able to change object templates.

The template of the Dataflow object can be found at Settings > Catalog Admin > Customize Catalog > Custom Templates > Data Object Templates > Dataflow.

For details on how to customize templates, see Custom Fields.

Deleting Dataflow Objects

You can delete a dataflow object using the lineage editor.

  1. Log in to Alation as an admin user.

  2. Open the catalog page of the data object for which you want to delete a dataflow object.

  3. Click the Lineage tab to open it, then click the Edit button on the top right of the diagram.

    ../../_images/lineageV3_03.png
  4. Click the node representing the dataflow you want to delete to open the dataflow object editor.

  5. Click the trash can icon to the right of the dataflow object’s name. You are shown a confirmation dialog that warns you that existing lineage will be disconnected and any external objects connected to the dataflow will be deleted.

  6. Click Delete to delete the dataflow, or Cancel to cancel the operation.

For dataflow objects created manually, you can also delete individual dataflow paths. See section “Deleting a Path” in Creating Lineage Data Manually.

Creating Dataflow Objects with API

See Lineage - General API Quick Start Guide.