Lineage

Applies from release 2022.3

Lineage is data about the origin of data. Lineage data documents how target data objects are created from source data objects. Lineage functionality has long been a feature of Alation’s data catalog, with three distinct phases known as Lineage V1, Lineage V2, and Lineage V3. Lineage V1 consisted solely of table-level lineage and was superseded by Lineage V2, and therefore will not be discussed further.

Alation automatically calculates lineage using metadata sourced from metadata extraction (MDE), query log ingestion (QLI), Compose queries, and the data posted over the Alation public APIs. Starting with Lineage V3, lineage can also be created manually. Lineage is visually represented as a diagram on the Lineage tab of a data source.

Lineage V2

Lineage V2 was introduced in Release V R6 (5.10.x). Lineage V2 is enabled by default from release 2021.2. In earlier releases, V R6 (5.10.x) to 2021.1, Lineage V2 could be enabled explicitly by an admin by setting a dedicated feature flag. This feature cannot be disabled.

Note

See Enabling Lineage V2 for information on how to enable Lineage V2 in releases V R6 to 2021.1

Lineage V2 adds column-level lineage to the existing table-level lineage. Table-level lineage data is calculated by the system automatically or can be added by an admin using the Lineage API. Column-level lineage data can be added over the API only. The level of detail in automatically generated table lineage depends on the volume of metadata available in the catalog for processing.

../../_images/Lin16.png

Lineage V2 also adds the object type Dataflow that can be used to document:

  • ETL processes

  • stored procedures

  • SQL queries

  • scripts that produce target data from source data

Lineage V3

Lineage V3, or the lineage service, introduced in Version 2021.4, is a microservice operating inside the Alation server. It is responsible for the creation, storage, and retrieval of lineage data into the catalog.

Lineage V3 is required for the Manual Lineage Curation feature to work.

The Alation server creates lineage data from multiple sources, such as metadata extraction (MDE), query log ingestion (QLI), Compose query history, and public APIs. With Lineage V3 enabled, lineage events generated from these sources are sent to the Lineage V3 service via Event Bus. In the lineage service:

  • the lineage write service consumes lineage events from the Event Bus and stores this lineage data into the lineage database;

  • the lineage read service retrieves the stored lineage data and powers the lineage diagrams in the Alation user interface.

../../_images/lineageV3_01.png

Lineage V3 is disabled by default. It can be enabled by migrating the lineage data from Lineage V2 to Lineage V3. Migration from Lineage V2 to Lineage V3 is required if an existing Alation instance already has lineage data. On new installations of Alation where no lineage data exists yet, Lineage V3 can be enabled using alation_conf.

Comparing Lineage V2 and Lineage V3

Lineage V2 and Lineage V3 are both frameworks for processing lineage data and representing it in the Alation Catalog. V2 and V3 use different application architectures. V2 is the older framework that was introduced in Alation version VR6 (5.10.x) and made default in version 2021.2. Lineage V3 is available from version 2021.4.

Both V2 and V3 support lineage diagrams, Dataflow objects, Lineage Impact Analysis reports, and the Lineage API. (You will see the Lineage APIs referred to as Lineage V2 APIs, but they actually apply equally to Lineage V2 and Lineage V3.) A big differentiator is the ability to create lineage data manually that is only available with Lineage V3.

Lineage V3 aims at addressing the three main challenges of the Lineage V2 framework:

  • Scalability: Lineage V3 is capable of performing ingestion of millions of lineage events and is expected to be much more performant.

  • Flexibility: Lineage V3 can potentially be extended to new object types and serves as a foundation for manual creation of lineage.

  • Cloud Readiness: Lineage V3 has potential to be leveraged as a component in the containerized application architecture, which is the basis for Alation cloud deployments.

Lineage API Documentation

Lineage API documentation can be found:

  • on the Developer Portal: Lineage V2

  • From release 2020.3, as an OAS 3.0 specification at <AlationInstanceURL>/openapi/lineage/.

    Note

    In releases prior to 2021.2, to view OAS 3.0 specifications of Alation APIs, enable the Swagger integration on your Alation instance: Enable Integration with Swagger.

    From release 2021.2, Swagger integration is enabled by default, and does not need any feature flags to be set.

For a quick start guide to lineage APIs, see Lineage - General API Quick Start Guide.

For a quick start guide to dataflow object APIs, see Lineage - Dataflow Quick Start.

For frequently asked questions about lineage and dataflow objects, see Lineage & Dataflow - Frequently Asked Questions.