Amazon Redshift Query Log Table

Automated daily Query Log Ingestion of Amazon Redshift queries requires an ETL to maintain a rolling Query Log having queries executed in the last 3 days. If Query Log table is small then we recommend maintaining queries for a longer period like the last seven days.

This needs to be a table, not a view, to allow the Alation user to see queries created by other people. If the Alation user queried the view, it would only see its own queries; if a process with full access to everyone’s queries creates and updates a table the Alation user queries, the Alation user can see all the queries in that table.

The following two queries can be used to create a rolling query log. The sample queries use public.alation_qlog as the Query Log table however the table can be named anything and can be placed in any schema as long as Alation DB account has SELECT, INSERT, and DELETE privileges to the table.

  1. Delete queries older than last three days.

    DELETE FROM PUBLIC.ALATION_QLOG
    WHERE START_TIME < (CURRENT_DATE - INTERVAL '3 days');
    
  2. Append queries executed in the last 24 hours to the Query Log. The following queries need to run as an admin user else the Query Log will not have queries by all users.

Use the following query for Amazon Redshift Serverless or Amazon Redshift Provisioned (Recommended):

(SELECT
  cast(sqt.text AS VARCHAR) as text,
  sqt.sequence as seq,
  CASE
    when sqh.status in ('failed' , 'canceled') then 'ABORTED'
    else ''
  end as canceled,
  datediff (millisecond, sqh.start_time, sqh.end_time) / 1000.0 as seconds_taken,
  cast(pui.usename AS VARCHAR) as user_name,
  cast(ssh.database_name AS VARCHAR) as default_database,
  cast(sqh.session_id AS VARCHAR) || '/' || to_char(ssh.start_time,'YYYY-MM-DD HH:MI:SS.US') as session_id,
    sqt.start_time as start_time,
    sqh.transaction_id as transaction_id
  INTO
    public.alation_qlog
  FROM
    sys_query_history sqh
  JOIN
    sys_query_text sqt ON sqh.query_id=sqt.query_id
  JOIN
    pg_user_info pui ON pui.usesysid=sqh.user_id
  JOIN
    sys_session_history ssh ON ssh.session_id=sqh.session_id
  WHERE
    sqh.user_id > 1
    AND sqt.start_time >= (LOCALTIMESTAMP - interval '3 day')
  );

Refer to the following query for only Amazon Redshift Provisioned:

(SELECT
    q.querytxt AS text,
    q.starttime AS start_time,
    CASE q.aborted
        WHEN 0 THEN ''
        ELSE 'ABORTED' END AS canceled,
    datediff (millisecond, q.starttime, q.endtime) / 1000.0 AS seconds_taken,
    s.user_name AS user_name,
    cast (s.process AS VARCHAR) || '/' || to_char
        (s.starttime, 'YYYY-MM-DD HH:MI:SS.US') || '/' || s.user_name AS session_id,
    q.sequence AS seq,
    trim (s.db_name) AS default_database,
    trim (label) AS query_filename,
    q.xid AS transaction_id,
    q.qtype AS qtype
    INTO public.alation_qlog
  FROM
    (
      SELECT * FROM
        (
          SELECT
            q.userid,
            q.query,
            q.label,
            q.xid,
            q.pid,
            DATABASE,
            qt.text::text AS querytxt,
            q.starttime,
            q.endtime,
            q.aborted,
            'QUERY' AS qtype,
            qt.sequence AS "sequence"
          FROM
            stl_query q
            JOIN stl_querytext qt ON q.query = qt.query
            AND q.xid = qt.xid
            AND q.pid = qt.pid
        ) query_combined
      UNION ALL
      SELECT
        userid,
        0 AS query,
        label,
        xid,
        pid,
        '' AS DATABASE,
        text::text AS querytxt,
        starttime,
        endtime,
        0 AS aborted,
        'DDL' AS qtype,
        "sequence"
      FROM
        stl_ddltext
    ) AS q
    INNER JOIN stl_sessions s ON q.pid = s.process
    AND q.userid = s.userid
  WHERE
    s.user_name != 'rdsdb'
    AND s.starttime >= (LOCALTIMESTAMP - interval '3 day')
);

Alternative SQL

Note

The query mentioned in this section does not automatically create table.

INSERT INTO
  public.alation_qlog (
    SELECT
      cast(sqt.text AS VARCHAR) as text,
      sqt.sequence as seq,
    CASE
      when sqh.status in ('failed' , 'canceled') then 'ABORTED'
      else ''
      end as canceled,
      datediff (millisecond, sqh.start_time, sqh.end_time) / 1000.0 as seconds_taken,
      cast(pui.usename AS VARCHAR) as user_name,
      cast(ssh.database_name AS VARCHAR) as default_database,
      cast(sqh.session_id AS VARCHAR) || '/' || to_char(ssh.start_time,'YYYY-MM-DD HH:MI:SS.US') as session_id,
      sqt.start_time as start_time,
      sqh.transaction_id as transaction_id
    FROM
      sys_query_history sqh
    JOIN
      sys_query_text sqt ON sqh.query_id=sqt.query_id
    JOIN
      pg_user_info pui ON pui.usesysid=sqh.user_id
    JOIN
      sys_session_history ssh ON ssh.session_id=sqh.session_id
    WHERE
      sqh.user_id > 1
    AND sqt.start_time >= (LOCALTIMESTAMP - interval '3 day')
  );