PostgreSQL Connector: Install and Configure

Prerequisites

Network Connectivity

Open outbound TCP port 5432 to PostgreSQL server.

Create Service Account

Sample SQL to create an account

CREATE USER alation WITH PASSWORD '[password]';

Permissions for Metadata Extraction

GRANT USAGE ON SCHEMA PG_CATALOG TO alation;
GRANT SELECT ON ALL TABLES IN SCHEMA PG_CATALOG TO alation;

Permissions for Table Profiling

GRANT USAGE ON SCHEMA [schema] TO alation;
GRANT SELECT ON ALL TABLES IN SCHEMA [schema] TO alation;

JDBC URI

When building the URI, include the following components:

  • Hostname or IP of the instance

  • Port number

  • Database name

URI format:

postgresql://<Host>:<Port>/<Database_Name>

Example:

postgresql://ifac-pgsql.ceyrldata.us-west-2.amazonaws.com:5432/postgres

Configuration in Alation

STEP 1: Install the Connector

Alation On-Premise

Important

Installation of OCF connectors requires Alation Connector Manager to be installed as a prerequisite.

To install an OCF connector:

  1. If this has not been done on your instance, install the Alation Connector Manager: Install Alation Connector Manager.

  2. Ensure that the OCF connector Zip file is available on your local machine.

  3. Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.

Alation Cloud Service

Note

On Alation Cloud Service instances, Alation Connector Manager is available by default.

Depending on your network configuration, you may need to use Alation Agent to connect to databases.

Connection via Alation Agent
  1. Ensure that Alation Agent is enabled on your Alation instance. If necessary, create a Support ticket with Alation for an Alation representative to enable the Alation Agent feature on your instance and to receive the Alation Agent installer.

  2. Install the Alation Agent.

  3. Install the OCF connector on Alation Agent.

Connection Without Agent

To install an OCF connector:

  1. Ensure that the OCF connector Zip file is available on your local machine.

  2. Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.

STEP 2: Create and Configure a New PostgreSQL Data Source

In Alation, add a new data source:

  1. Log in to Alation as a Server Admin.

  2. Expand the Apps menu on the right of the main toolbar and select Sources.

  3. On the Sources page, click +Add on the top right of the page and in the list that opens, click Data Source. This will open the Add a Data Source wizard.

  4. On the first screen of the wizard, specify a name for your data source, assign additional Data Source Admins, if necessary, and click the Continue Setup button on the bottom. The Add a Data Source screen will open.

  5. On the Add a Data Source screen, the only field you should populate is Database Type. From the Database Type dropdown, select the connector name. After that you will be navigated to the Settings page of your new data source.

The name of this connector is PostgreSQL OCF Connector.

../../../_images/PostgreSQLOCF_01.png

Access

On the Access tab, set the data source visibility using these options:

  • Public Data Source—The data source will be visible to all users of the catalog.

  • Private Data Source—The data source will be visible to the users allowed access to the data source by Data Source Admins.

You can add new Data Source Admin users in the Data Source Admins section.

General Settings

Note

This section describes configuring settings for credentials and connection information stored in the Alation database. If your organization has configured Azure KeyVault or AWS Secrets Manager to hold such information, the user interface for the General Settings page will change to include the following icons to the right of most options:

../../../_images/VaultOrDB.png

By default, the database icon is selected, as shown. In the vault case, instead of the actual credential information, you enter the ID of the secret. See Configure Secrets for OCF Connector Settings for details.

Perform the configuration on the General Settings tab:

  1. Specify Application Settings:

    Parameter

    Description

    BI Connection Info

    Not applicable

  2. Click Save.

  3. Specify Connector Settings:

    Parameter

    Description

    Data Source Connection

    JDBC URI

    Provide the JDBC URI constructed in JDBC URI.

    Username

    Provide the service account username.

    Password

    Provide the service account password.

    Enable Kerberos Authentication

    Enable or disable Kerberos authentication by selecting or clearing the Enable Kerberos Authentication checkbox.

    If the Kerberos Authentication checkbox is enabled, upload the krb5.conf file using the upload link below.

    Keytab

    Not applicable

    SSL Certificate

    Select the Enable SSL checkbox to enable SSL connection for PostgreSQL EC2 and PostgreSQL Enterprise.

    If the Enable SSL checkbox is selected, upload the SSL certificate using the upload link below.

    Use pem file for SSL

    Select the Use pem file for SSL checkbox to enable SSL connection for PostgreSQL RDS.

    If the Use pem file for SSL checkbox is selected, upload the pem file using the upload link below.

    Truststore Password

    Provide the password for the SSL certificate.

    The password will be deleted if the data source connection is deleted.

    Logging Information

    Log Level

    Select the Log Level to generate logs. The available log levels are based on the Log4j framework.

  4. Click Save.

  5. Obfuscate Literals - Enable this toggle to hide the details of the queries in the catalog page that are ingested via QLI or executed in Compose. This toggle is disabled by default.

  6. Under Test Connection, click Test to validate network connectivity.

    ../../../_images/PostgreSQLOCF_02.png

Add-On OCF Connector for dbt

The dbt connector can be used as an add-on with another OCF connector to extract and catalog descriptions and lineage from dbt models, sources, and columns in dbt Core or dbt Cloud. See Add-On OCF Connector for dbt on how to use this connector.

Metadata Extraction

You can configure metadata extraction (MDE) for an OCF data source on the Metadata Extraction tab of the Settings page. Refer to Configure Metadata Extraction for OCF Data Sources for information about the available configuration options.

The default queries that the connector uses to extract metadata can be found in Extraction Queries for PostgreSQL. You can customize these queries to adjust the extraction to your needs.

Compose

For details about configuring the Compose tab of the Settings, refer to Configure Compose for OCF Data Sources.

../../../_images/PostgreSQLOCF_03.png

Note

OAuth connection is not supported for this data source.

Data Sampling

Sampling and profiling is supported. For details, see Configure Sampling and Profiling for OCF Data Sources.

Query Log Ingestion

You can either create a table for Alation to pull the query logs from or use a custom query to query the logs from the PostgreSQL data source.

QLI Setup for PostgreSQL EC2 and PostgreSQL Enterprise

Before performing the QLI, perform the QLI setup for PostgreSQL EC2 and PostgreSQL Enterprise. Refer to PostgreSQL QLI Setup.

QLI Setup for PostgreSQL RDS

Perform the following steps to configure the pre-requisites for PostgreSQL RDS QLI:

  1. In the AWS console, go to Amazon RDS > Parameter groups > Create parameter group.

  2. Provide the values in the following fields and click Create.

    • Parameter group family

    • Type

    • Group Name

    • Descriptions

    ../../../_images/PostgreSQLOCF_05.png
  3. Go to Parameter groups and select the newly created parameter group.

    ../../../_images/PostgreSQLOCF_06.png
  4. Click Edit parameters and update the following parameters with the values below:

    Parameter

    Value

    log_destination

    csvlog

    log_filename

    postgresql.log.%Y-%m-%d

    log_hostname

    1

    log_min_duration_statement

    0

    log_rotation_age

    1440

    log_statement

    all

    rds.log_retention_period

    4320

  5. Go to Databases > RDS Instance.

  6. Associate the DB parameter group with the RDS instance. Click Modify > Additional Configuration > DB parameter group and select the DB parameter group.

    Note

    The RDS instance major version and DB parameter group instance selection version must be the same. If the version is not same, then the DB parameter group will not be listed in this dropdown.

    ../../../_images/PostgreSQLOCF_07.png
  7. Restart the database.

Configuration in Alation

In Compose:

  1. Run the query given below:

CREATE EXTENSION log_fdw;

CREATE SERVER log_server FOREIGN DATA WRAPPER log_fdw;

CREATE OR REPLACE PROCEDURE public.p_getcurrentlog(
)
LANGUAGE 'plpgsql'
AS $BODY$
declare
log_file_date varchar;
log_file_prefix varchar := 'postgresql.log.';
log_file_suffix varchar := '.csv';
full_log_file varchar;
current_log_server varchar := 'log_server';
current_log_table varchar := 'postgres_logs';
begin
--Create a foreign table over the previous day's CSV log file
RAISE LOG '****Starting p_getcurrentlog Procedure';
EXECUTE FORMAT('select cast(current_date -1 AS varchar)') INTO log_file_date;
RAISE LOG 'Processing log file date: %', log_file_date;
EXECUTE FORMAT('select %L || %L ||%L', log_file_prefix, log_file_date, log_file_suffix) INTO
full_log_file;
EXECUTE FORMAT('DROP FOREIGN TABLE IF EXISTS %I CASCADE', current_log_table);
EXECUTE FORMAT('SELECT create_foreign_table_for_log_file(%L, %L, %L)', current_log_table,
current_log_server, full_log_file);
RAISE LOG '****Ending p_getcurrentlog Procedure';
end;
$BODY$;
  1. Run the next query to view the .CSV files:

    SELECT * FROM list_postgres_log_files() ORDER BY 1;
    
  2. Execute the query given below for procedure call.

    CALL public.p_getcurrentlog();
    
    ../../../_images/PostgreSQLOCF_08.png

Connector Settings

Table-Based QLI

Use the query format below to create a new table and flush the query history from the PostgreSQL data source.

CREATE TABLE <Table_Name> AS
SELECT a.*
  FROM
  (
    SELECT
      user_name AS userName,
      session_start_time AS startTime,
      message AS queryString,
      process_id AS sessionId,
      session_start_time AS sessionStarttime,
      extract(second from log_time::timestamp::time) as seconds,
      log_time AS milliseconds,
      'N' AS cancelled,
      database_name AS defaultDatabase
    FROM
      <Table_Name>
  );

In the Table Name field, provide the name of the table in which the query logs are available. The table name must be provided in the following format: database.table or database.viewname.

Custom Query-Based QLI

If you cannot create the view for QLI, for example, due to access restrictions, you can use a custom QLI query to extract query history into Alation. If you opt for custom query-based QLI, Alation will query the system table storing query history or the table you’ve created to enable QLI every time you manually run QLI or when the QLI job runs on schedule.

For custom query-based QLI to succeed, ensure that the service account has enough permissions to select from the QLI table.

The template for the QLI query is given below. You can customize it by adding, removing, or changing the filter, but the columns and their aliases must remain as is since the connector expects this query structure.

Note

When using the QLI query template, do not substitute the STARTTIME and ENDTIME parameters in the WHERE filter. These parameters are not actual column names and should stay as is. They are expected by the connector and will be substituted with the start and end date of the QLI range selected in the user interface when QLI is run manually or on schedule.

To configure query-based QLI:

  1. Go to the Query Log Ingestion tab of the Settings page of your OCF data source.

  2. Under Connector Settings > Query Extraction, in the Custom QLI Query field, provide the QLI query.

  3. Click Save.

QLI Query Template
SELECT
  user_name AS userName,
  session_start_time AS startTime,
  message AS queryString,
  process_id AS sessionId,
  session_start_time AS sessionStarttime,
  extract(second from log_time::timestamp::time) as seconds,
  'N' AS cancelled,
  database_name AS defaultDatabases
FROM <Table_Name>;
WHERE
  session_start_time >= TO_DATE(STARTTIME, 'YYYY-MM-DD HH24:MI:SS')
  and session_start_time< TO_DATE(ENDTIME, 'YYYY-MM-DD HH24:MI:SS')

Perform QLI

You can either perform QLI manually on demand or enable automated QLI:

  1. To perform manual QLI, under the Automated and Manual Query Log Ingestion section of the Query Log Ingestion tab, ensure that the Enable Automated Query Log Ingestion toggle is disabled.

    Note

    Metadata extraction must be completed first before running QLI.

  2. Click Preview to get a sample of the query history data to be ingested.

  3. Click the Import button to perform QLI on demand.

  4. To schedule QLI, enable the Enable Automated Query Log Ingestion toggle.

  5. Set a schedule under Automated Query Log Ingestion Time by specifying values in the week, day, and time fields. The next QLI job will run on the schedule you have specified.

    Note

    The hourly schedule for automated QLI is not supported.

Troubleshooting

Refer to Troubleshooting.