Reset ETL Checkpoint for the RDBMS Tables

Customer Managed Applies to customer-managed instances of Alation

Applies from version 2024.1

Problem

Titles and descriptions for RDBMS objects aren’t updated in Alation Analytics after the ETL process when changes have been made via data dictionary uploads.

Root Cause

The update timestamps (ts_updated) in the tables in the Alation internal database (Rosemeta) are not refreshed after changes to titles and descriptions have been made via data dictionary uploads, which causes no updates in Alation Analytics after the ETL process.

In the normal case, when users curate titles and descriptions for RDBMS objects through uploading data dictionaries in the Alation user interface, the system expects the titles and descriptions to be updated along with the update timestamps in the tables of the Alation internal database. The Alation Analytics ETL process then picks up these updated timestamps to refresh the data in the relevant tables in the Alation Analytics database.

Resolution

The issue was resolved in version 2024.1. Going forward, titles and descriptions updated via data dictionary uploads will be accurately reflected in Alation Analytics.

To update titles and descriptions for RDBMS objects that were previously omitted from Alation Analytics due to this issue, apply the Python script given below. The script doesn’t require downtime and can be run at any convenient time after updating Alation to 2024.1. The script resets the ETL checkpoint for the RDBMS model in Alation Analytics, configuring the next Alation Analytics ETL to refresh all data in the Alation Analytics RDBMS tables from the earliest record in the internal database. The following tables are impacted:

  • rdbms_datasources

  • rdbms_schemas

  • rdbms_tables

  • rdbms_columns

Script

#! /usr/bin/env python
import bootstrap_django_models
from django.db import connections
from django.apps import apps


def get_first_users_date_joined():
    result = None
    with connections["default"].cursor() as cursor:
        SQL = """
            select date_joined from auth_user order by date_joined asc limit 1;
        """
        cursor.execute(SQL)
        result = cursor.fetchone()
    return result[0]


def reset_checkpoint_for_model(model_name):
    table = apps.get_model("alation_analytics_v2", model_name)
    checkpoint = apps.get_model("alation_analytics_v2", "ETLCheckpoint")
    load_table = table._meta.db_table
    record = checkpoint.objects.filter(load_table_name=load_table)
    if record:
        record = record.latest("timestamp")
        first_user_joined_date = get_first_users_date_joined()
        if first_user_joined_date is not None:
            record.reset_checkpoint_to = str(first_user_joined_date)
            record.is_incremental_reset = True
            record.save()


if __name__ == "__main__":
    print("Started resetting ETL checkpoint for RDBMS models.")
    model_names = ["DimDatasources", "DimSchemas", "DimTables", "DimColumns"]
    for model_name in model_names:
        print("Processing model:", model_name)
        reset_checkpoint_for_model(model_name)
    print("Completed resetting ETL checkpoint for RDBMS models.")

Steps

To reset the ETL checkpoint for the RDBMS model:

  1. Copy and save the script to a temporary location on your Alation instance.

  2. Use SSH to connect to your Alation instance.

  3. Copy the script from the temporary directory to its designated directory on the Alation server. In the following command, we’re using a /tmp directory as an example.

    sudo cp /tmp/reset_checkpoint_for_rdbms_models.py /opt/alation/alation/opt/alation/django/alation_analytics_v2/one_off_scripts/
    
  4. Enter the Alation shell.

    sudo /etc/init.d/alation shell
    
  5. Change to the scripts directory.

    cd /opt/alation/django/alation_analytics_v2/one_off_scripts/
    
  6. Update ownership of the script.

    sudo chown alation:alation reset_checkpoint_for_rdbms_models.py
    
  7. Update the execution permissions of the script.

    sudo chmod a+x reset_checkpoint_for_rdbms_models.py
    
  8. Switch to the alation user.

    sudo su alation
    
  9. Ensure you are in the script directory /opt/alation/django/alation_analytics_v2/one_off_scripts/.

  10. Execute the script.

    python reset_checkpoint_for_rdbms_models.py
    
  11. The script completion will be indicated by a success message Completed resetting ETL checkpoint for RDBMS models. The next Alation Analytics ETL will start reloading the RDBMS tables in Alation Analytics from the earliest record. Depending on your table size, the reload process may span more than one ETL cycle. Going forward, the changes to titles and descriptions that are updated via data dictionary uploads will be updated in Alation Analytics accordingly.

  12. Exit from the alation user and the Alation shell by using exit twice.