Enable Lineage V3

Customer Managed Applies to customer-managed instances of Alation

Applies from version 2022.1.2 to 2022.4

Note

From version 2022.4, Lineage V3 is enabled by default on all new Alation instances.

In versions 2022.1.2 to 2022.4, Lineage V3 must be enabled if users want the ability to create lineage manually. Lineage V3 is enabled through the migration of lineage data from Lineage V2 to Lineage V3. Lineage V3 will become enabled automatically after the migration completes. If the migration fails, your instance will remain on Lineage V2.

Important

Alation recommends upgrading your Alation instance to version 2022.1.2 or newer in order to perform migration from Lineage V2 to V3.

Warning

Migration from Lineage V2 to V3 may be a time-consuming and resource-intensive process, depending on the size of the lineage data. It requires a restart of several important Alation services. We recommend performing the migration during off-peak hours when there is little user activity and when other resource-intensive processes, such as MDE, QLI, or search indexing are not running. The migration may take from a few minutes to a few hours depending on the size of existing lineage data.

The migration process requires backend access to the Alation server. In case of HA pair configuration, perform the migration on the primary instance.

There are two situations in which you may be performing this migration:

  • You are running the migration script for the first time. If you have no Lineage V3 data and this is your first attempt to migrate lineage data from Lineage V2 to V3, start with Migrate Lineage from V2 to V3.

  • You have run the migration script and it failed. If you are retrying the migration after a failed attempt, start with Reset the Migration Status Pointer and after that rerun the migration.

Reset the Migration Status Pointer

If you had previously attempted to migrate the lineage data from Lineage V2 to V3 but your attempt resulted in a failure, you can rerun the migration from the very beginning. Start with resetting the migration status pointer to zero. The migration status pointer is a record in the internal Alation database that documents the number of lineage links that were successfully migrated before the migration failed.

To reset the pointer:

  1. Use SSH to connect to the Alation host.

  2. Enter the Alation shell:

    sudo /etc/init.d/alation shell
    
  3. Enter the Alation Postgres shell:

    alation_psql
    
  4. Run the query given below to reset the migration pointer to zero.

    update message_queue_pointer set pointer=0 where job_name='Lineage_V2_To_V3_Migration';
    
  5. Run the next query to ensure that the message_queue_pointer was updated. The result should be zero.

    select * from message_queue_pointer where job_name='Lineage_V2_To_V3_Migration';
    
  6. Exit the Postgres shell.

    \q
    
  7. Stay in the Alation shell and perform the lineage migration steps below to rerun the migration from the beginning.

If you continue to have difficulties completing the migration, contact Alation Support for assistance.

Migrate Lineage from V2 to V3

Use the steps below to migrate lineage data from Lineage V2 to Lineage V3 and to activate the Lineage V3 functionality:

  1. Before migrating, in your Alation catalog, find data sources and BI sources that have lineage data. Save the URLs for table or BI report pages with lineage. After running the migration script, you can use these data objects to validate that the lineage data is successfully migrated and the lineage charts are displayed without issues.

  2. Use SSH to connect to the Alation server.

  3. Enter the Alation shell.

    sudo /etc/init.d/alation shell
    
  4. Set the lineage logger level to debug. This logger setting will create more detailed logs, which will be helpful if troubleshooting is necessary.

    4.1. Change the logger level.

    alation_conf lineage-service.logger.level -s debug
    

    4.2. Restart the lineage service for the changes to take effect.

    alation_supervisor restart lineage
    
  5. Ensure that the lineage service is running.

    alation_supervisor status lineage
    

    This command is expected to return the status Running, for example:

    $ alation_supervisor status lineage
    lineage              RUNNING   pid 1184, uptime 5 days, 19:10:35
    

    The Lineage service should be in the Running state even though Lineage V3 has not been enabled yet. It is the expected state of the system.

  6. Ensure that the Event Bus component is running.

    alation_supervisor status event-bus:*
    

    This command is expected to return the status Running, for example:

    $ alation_supervisor status event-bus:*
    event-bus:kafka-server       RUNNING   pid 1128, uptime 5 days, 19:11:05
    event-bus:zookeeper-server   RUNNING   pid 1127, uptime 5 days, 19:11:05
    
  7. Ensure that the Event Bus is receiving messages.

    7.1. Enter the Django shell.

    alation_django_shell
    

    7.2. Use the following commands:

    from alation_event_bus_utils import check_event_bus
    check_event_bus
    

    If you see any errors, do not proceed with the migration. Contact Alation Support to help resolve this issue.

    7.3. Exit the Django shell.

    exit
    
  8. Check the size of the existing lineage data. This number can be used to estimate the time required for the migration. It takes about an hour to migrate 1 GB of lineage data on an instance where the migration is not competing for resources with multiple other processes.

    8.1. Enter the Postgres shell.

    alation_psql
    

    8.2. Run the following query:

    select pg_size_pretty(pg_total_relation_size('object_lineage_link'));
    

    Sample output:

    rosemeta=# select pg_size_pretty(pg_total_relation_size('object_lineage_link'));
    
    pg_size_pretty
    ----------------
    1358 MB
    (1 row)
    
  9. Still in the Postgres shell, run the next query to get the unique link count from the object_lineage_link table. This number shows how many lineage links will need to be migrated. Take note of this number. It will be used to validate the completeness of the migration later.

    select count(*) from (select distinct source, source_otype, target, target_otype from object_lineage_link) as links;
    

    Sample output:

    rosemeta=# select count(*) from (select distinct source, source_otype, target, target_otype from object_lineage_link) as links;
    
    count
    -------
    427947
    (1 row)
    
  10. Exit the Postgres shell.

    \q
    
  11. (Optional) If your lineage data is very large, you can optionally enable the Event Bus consumer throttling. If not, you can skip this step. During a very large V2 to V3 migration, the lineage service can potentially use a lot of memory which may affect other applications. Event Bus consumers can be throttled to allow small breaks between every batch so that they use less memory. This results in a slightly slower migration. To enable the Event Bus consumer throttling:

    11.1. In the Alation shell, enable throttling.

    alation_conf lineage-service.kafka.topics.consumer_throttling.enabled -s True
    

    11.2. By default, throttling begins when memory usage of the lineage service is at 90%. To change this value, use the command below:

    alation_conf lineage-service.kafka.topics.consumer_throttling.threshold -s <new threshold>
    

    Example:

    alation_conf lineage-service.kafka.topics.consumer_throttling.threshold -s 60
    

    11.3. By default, throttling allows for 1,000 milliseconds (1 second) to pass before the next batch is consumed. To change this value, use the command below:

    alation_conf lineage-service.kafka.topics.consumer_throttling.duration -s <new duration in milliseconds>
    

    Example:

    alation_conf lineage-service.kafka.topics.consumer_throttling.duration -s 2000
    

    11.4. Restart the service for the changes to take effect.

    alation_supervisor restart lineage
    
  12. Starting in version 2022.4, the logs resulting from lineage migration are written to the file celery-lineagepublishing_error.log. Email notifications of the migration’s progress will be sent to all admins of the instance. Alation recommends tailing this file during the migration. Open a separate terminal window and prepare to tail the logs, in the Alation shell. If you’re using a screen multiplexer, prepare to tail in a separate screen:

    tail -f  /opt/alation/site/logs/celery-lineagepublishing_error.log
    

    In versions before 2022.4, the log files are written to the file celery-default_error.log. You can tail this file in a similar manner:

    tail -f /opt/alation/site/logs/celery-default_error.log
    
  13. In the terminal window that is connected to the Alation host and Alation shell, enter the Alation Django shell:

    alation_django_shell
    
  14. Run the Lineage V3 migration script.

    from rosemeta.tasks.migrations import migrate_lineage_v2_data_to_v3_database
    migrate_lineage_v2_data_to_v3_database.delay()
    

    This will kick off the migration job. Information about progress will be written to either celery-lineagepublishing_error.log (version 2022.4 and newer) or celery-default_error.log (versions before 2022.4). At the end of the migration, you should see a success message similar to the following:

    [2022-03-05 00:12:51,622: INFO/ForkPoolWorker-8] rosemeta.tasks.migrations.migrate_lineage_v2_data_to_v3_database[23e9a161-64cf-4da7-a1d0-ee1b1e783b8c]:
    Lineage v2 to v3 migration task completed successfully.
    You can check the job status in alation_django_shell by running....`j = Job.objects.filter(job_type=33).order_by('-ts_finished')` and then `j[0].__dict__`
    
  15. After the migration completes, still within the Django shell, check the migration status for the links and link nodes migration jobs.

    15.1. To check the job status for the links migration job, run the following commands:

    job = Job.objects.filter(job_type=33).order_by("ts_started")
    job[0].__dict__
    

    In the output, look for the lines which contain status and state. Migration has succeeded if you find:

    • status: 1 means SUCCEEDED (migration has completed successfully)

    • state: 3 means FINISHED (migration has finished)

    The output will look similar to the following:

    In [3]: job = Job.objects.filter(job_type=33).order_by("ts_started")
    
    In [4]: job[0].__dict__
    Out[4]:
    {'_state': <django.db.models.base.ModelState at 0x7f7c402c6438>,
    'id': 1695,
    'user_id': None,
    '_enum_job_type': 33,
    'job_type': 33,
    'external_service_aid': None,
    '_enum_external_service_otype': None,
    'external_service_otype': None,
    'ts_started': datetime.datetime(2021, 12, 9, 2, 7, 45, 937116, tzinfo=<UTC>),
    'ts_updated': datetime.datetime(2021, 12, 9, 2, 7, 48, 2729, tzinfo=<UTC>),
    'ts_finished': datetime.datetime(2021, 12, 9, 2, 7, 48, 2729, tzinfo=<UTC>),
    '_enum_status': 1,
    'status': 1,
    '_enum_state': 3,
    'state': 3,
    'state_message': 'Published 435 links',
    'persisting_message': ['Processed batch with 545 links, is_last_batch: True',
    'Published 435 links'],
    'details': None,
    'sync_subprocess_info': {'LINEAGE_V2_TO_V3_DATA_MIGRATION': '18685_797741740'},
    'async_subprocess_info': {},
    'disabled_task_name': None}
    
    If you find out that the result is different from ``status: 1`` and ``state: 3``, the migration ended with issues. See :ref:`Troubleshoot the Lineage V3 Migration <troubleshoot-v3>` for more details.
    

    Note

    Job Status values:

    • N/A = 0

    • SUCCEEDED = 1

    • FAILED = 2

    • PARTIAL_SUCCESS = 3

    • SKIPPED = 4

    Job State values:

    • NOT_STARTED = 0

    • QUEUED = 1

    • STARTED = 2

    • FINISHED = 3

    15.2. To check the job status for the link nodes migration job, run the following commands from within the Django shell:

    j = Job.objects.filter(job_type=34).order_by("-ts_finished")
    j[0].__dict__
    

    Check the job detail records. They should show the number of messages that the lineage service received. Each message is a Kafka message and contains a maximum of 100 links.

    d = j[0].job_details.all()
    d[0].__dict__
    d[1].__dict__
    

    The link nodes migration job is a subprocess of the links migration job. It migrates Alation objects that are part of lineage links. If links appear in the user interface but their nodes appear to be all temporary (temp) even though they are not known to be temporary or deleted in Lineage V2, then the status of this job can be checked again to see if it has completed running. Once it completes, all non-temporary objects should appear as such in lineage charts.

  16. Exit the Django shell:

    exit
    
  17. Validate the number of the migrated lineage links.

    17.1. Enter the Postgres shell:

    alation_psql
    

    17.2. Switch to the lineage database:

    \c lineage
    

    17.3. Run the following query:

    select count(*) from edge where relation_type=1;
    

    The count should be equal to the count of unique links you retrieved in Step 8 from the object_lineage_link table in Rosemeta. If the counts are not equal then there is some issue with the migration. Review the logs: Check the Lineage Service Logs.

    17.4. Exit the Postgres shell:

    \q
    
  18. If you are running on a version older than 2022.4, restart the Celery component.

    alation_supervisor restart celery:*
    
  19. You can return to the default lineage logger level (info).

    19.1. Run the following command:

    alation_conf -s info lineage-service.logger.level
    

    19.2. Restart the lineage service:

    Note

    If you enabled Event Bus throttling before the migration, you can first disable it and then perform this restart. See the next step.

    alation_supervisor restart lineage
    
  20. If before the migration you enabled Event Bus throttling, you can now disable it.

    20.1. Disable the corresponding alation_conf flag:

    alation_conf lineage-service.kafka.topics.consumer_throttling.enabled -s False
    

    20.2. Restart the service:

    alation_supervisor restart lineage
    
  21. Exit the Alation shell:

    exit
    
  22. Log in to Alation and verify your lineage data in the Alation UI. There should be no changes to the content of the lineage charts. If lineage data does not appear or there are issues with the charts, for example, existing objects appear as temporary (temp) or the charts only show partial data, contact Alation Support.

Important

The migration script automatically enables the Lineage V3 service. No additional actions are required to enable it. Next, you can enable Manual Lineage Curation. See Enabling Manual Lineage Curation.

Disable Lineage V3

Alation does not recommend going back to Lineage V2 after enabling and using Lineage V3. Disabling V3 will result in an outdated state of the lineage data in the catalog. Although not recommended, disabling Lineage V3 is still possible. For example, if you observe issues with the CPU or memory consumption while using Lineage V3 that you consider critical to the instance health, you can fall back onto Lineage V2.

Contact Alation Support to help you return to using Lineage V2.

Important

The new lineage data that was created on Lineage V3 will not be auto-migrated to Lineage V2 when you disable Lineage V3. The lineage graphs that were created manually, automatically, or using the API on Lineage V3 will not be available after Lineage V3 is disabled. The lineage data will return to the state before the lineage data was migrated to Lineage V3.

Enable Lineage V3 on New Installations

On new installations only, where there is no lineage data yet, you can enable Lineage V3 using alation_conf.

Warning

If your instance already has lineage data, do not use alation_conf to enable Lineage V3. Instead, perform lineage data migration from V2 to V3.

To enable Lineage V3 using alation_conf on a new Alation instance:

  1. Use SSH to connect to the Alation server.

  2. Enter the Alation shell using the following command:

    sudo /etc/init.d/alation shell
    
  3. Run the command given below to enable Lineage V3.

    alation_conf -s True lineage-service.enabled
    
  4. Restart the Celery component.

    alation_supervisor restart celery:*
    
  5. Ensure that the lineage service is running.

    alation_supervisor status lineage
    

    This command is expected to return the status Running, for example:

    (env) PROD [admin@ip-177-71-27-47 /]$ alation_supervisor status lineage
    lineage                  RUNNING   pid 1184, uptime 5 days, 19:10:35
    

    If the lineage service is in the Running state with the Lineage V3 feature flag enabled, this means that all sources of lineage data, such as MDE, QLI, Compose, and Lineage V2 APIs are now using the lineage service.

  6. If the lineage service is in the Stopped state, restart it.

    alation_supervisor restart lineage
    

Troubleshoot the Lineage V3 Migration

If the lineage migration from V2 to V3 failed, check if it is at the stage where it already enabled Lineage V3 and contact Alation Support.

Check the Lineage V3 Feature State

You can check if Lineage V3 is enabled or disabled using the alation_conf feature flag lineage-service.enabled. To check the state of this flag, from the Alation shell, run:

alation_conf lineage-service.enabled

The values can be:

  • True: Lineage V3 enabled

  • False: Lineage V3 disabled

Alation does not recommend changing the value of this parameter manually.

Check the Lineage Service Logs

There are two log files in the logs directory at /opt/alation/site/logs/ that are associated with the lineage service:

  • lineage.log: Contains information about the service health and general logging statements for both read and write pipelines.

  • lineage_error.log: Logs all errors that occur in the Lineage service, for example, errors related to writing to or retrieval from the Event Bus, errors in lineage processing, request process or database access.

In addition to these two dedicated log files, some other log files on are also relevant to the lineage service:

  • alation-info.log

  • alation-debug.log

  • celery-default.log

  • celery-lineagepublishing_error.log (Alation versions 2022.4 and later) or celery-default_error.log (on versions before 2022.4)

If you find issues with displaying lineage data after the migration, check the celery-lineagepublishing_error.log or celery-default_error.log for specific errors.

Change the Lineage Service Logging Level

The lineage service logs can be written in two modes: info or debug. The default is info. You can set it to debug in order to capture a more detailed log.

To change the log level:

  1. From the Alation shell, run the command below.

    alation_conf -s debug lineage-service.logger.level
    
  2. Restart the lineage service.

    alation_supervisor restart lineage
    

To return to the info log level:

  1. From the Alation shell, run the command below.

    alation_conf -s info lineage-service.logger.level
    
  2. Restart the lineage service.

    alation_supervisor restart lineage
    

Check the Lineage Service Status

To check the status of the Lineage service, from the Alation shell, run:

alation_supervisor status lineage

Check the Event Bus Status

To check the status of Event Bus, from the Alation shell, run:

alation_supervisor status event-bus:*

Both the Lineage service and Event Bus must be in the Running state for Lineage V3 to function correctly.