Greenplum OCF Connector: Install and Configure

Alation Cloud Service Applies to Alation Cloud Service instances of Alation

Customer Managed Applies to customer-managed instances of Alation

Prerequisites

Ports

Open the outbound TCP port 5432 to the Greenplum server.

Service Account

On the Greenplum database, create a service account for Alation and grant it the permissions for metadata extraction, sampling and profiling, and query log ingestion.

Example

CREATE ROLE <service_account_name> LOGIN password 'passwd';
GRANT CONNECT ON DATABASE <database_name> TO <service_account_name>;

Metadata Extraction

Grant the service account the USAGE permission on all schemas that you want extracted and ALL PRIVILEGES ON ALL TABLES in these schemas.

Example

GRANT USAGE ON SCHEMA <database.schema> TO <service_account_name>;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA <database.schema> TO <service_account_name>;

Note

Use fully qualified schema names.

Sampling and Profiling

The permissions granted for metadata extraction are enough for sampling and profiling.

Query Log Ingestion

To perform query log ingestion (QLI), the service account requires access to the gpperfmon database and the SELECT permissions on the table or view created for QLI. The details on QLI configuration are available in Table-Based QLI Using gpperfmon below.

JDBC URI

Format

The JDBC URI must use the following format:

postgresql://<hostname_or_ip>:<port>/<service_name>

where <service_name> stands for the database that you are adding to the catalog.

No additional parameters are required in the URI string when using an LDAP account and/or SSL.

Example

postgresql://10.13.92.29:5432/test_database

Configuration in Alation

STEP 1: Install the Connector

Alation On-Premise

Important

Installation of OCF connectors requires Alation Connector Manager to be installed as a prerequisite.

To install an OCF connector:

  1. If this has not been done on your instance, install the Alation Connector Manager: Install Alation Connector Manager.

  2. Ensure that the OCF connector Zip file is available on your local machine.

  3. Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.

Alation Cloud Service

Note

On Alation Cloud Service instances, Alation Connector Manager is available by default.

Depending on your network configuration, you may need to use Alation Agent to connect to databases.

Connection via Alation Agent
  1. Ensure that Alation Agent is enabled on your Alation instance. If necessary, create a Support ticket with Alation for an Alation representative to enable the Alation Agent feature on your instance and to receive the Alation Agent installer.

  2. Install the Alation Agent.

  3. Install the OCF connector on Alation Agent.

Connection Without Agent

To install an OCF connector:

  1. Ensure that the OCF connector Zip file is available on your local machine.

  2. Install the connector on the Connectors Dashboard page using the steps in Manage Connectors.

STEP 2: Create and Configure a New Data Source

In Alation, add a new data source:

  1. Log in to Alation as a Server Admin.

  2. Expand the Apps menu on the right of the main toolbar and select Sources.

  3. On the Sources page, click +Add on the top right of the page and in the list that opens, click Data Source. This will open the Add a Data Source wizard.

  4. On the first screen of the wizard, specify a name for your data source, assign additional Data Source Admins, if necessary, and click the Continue Setup button on the bottom. The Add a Data Source screen will open.

  5. On the Add a Data Source screen, the only field you should populate is Database Type. From the Database Type dropdown, select the connector name. After that you will be navigated to the Settings page of your new data source.

    Note

    Agent-based connectors will have the Agent name appended to the connector name.

The name of this connector is Greenplum OCF connector.

Access

On the Access tab, set the data source visibility using these options:

  • Public Data Source—The data source will be visible to all users of the catalog.

  • Private Data Source—The data source will be visible to the users allowed access to the data source by Data Source Admins.

You can add new Data Source Admin users in the Data Source Admins section.

General Settings

Note

This section describes configuring settings for credentials and connection information stored in the Alation database. If your organization has configured Azure KeyVault or AWS Secrets Manager to hold such information, the user interface for the General Settings page will change to include the following icons to the right of most options:

../../../_images/VaultOrDB.png

By default, the database icon is selected, as shown. In the vault case, instead of the actual credential information, you enter the ID of the secret. See Configure Secrets for OCF Connector Settings for details.

Application Settings

Specify Application Settings if applicable. Click Save to save the changes after providing the information.

Parameter

Description

BI Connection Info

This parameter is used to generate lineage between the current data source and another source in the catalog, for example a BI source that retrieves data from the underlying database. The parameter accepts host and port information of the corresponding BI data source connection.

Use the following format: host:port

You can provide multiple values as a comma-separated list:

10.13.71.216:1541,sever.com:1542

Find more details in BI Connection Info.

Disable Automatic Lineage Generation

Select this checkbox to disable automatic lineage generation from QLI, MDE, and Compose queries. By default, automatic lineage generation is enabled.

Connector Settings

Data Source Connection

Populate the data source connection information and save the values by clicking Save in this section.

Parameter

Description

JDBC URI

Specify the JDBC URI in the required format.

Username

Specify the service account username.

Password

Specify the service account password.

Enable SSL

Enable or disable SSL authentication by selecting or clearing the Enable SSL checkbox.

If the Enable SSL checkbox is enabled, upload the SSL certificate using the upload link below.

Truststore password

Specify the password for the SSL certificate.

The password will be deleted if the data source connection is deleted.

Logging Configuration

Select the logging level for the connector logs and save the values by clicking Save in this section. The available log levels are based on the Log4j framework.

Parameter

Description

Log level

Select the log level to generate logs. The available options are INFO, DEBUG, WARN, TRACE, ERROR, FATAL, ALL.

Obfuscate Literals

Obfuscate Literals—Enable this toggle to hide the details of the queries in the catalog page that are ingested via QLI or executed in Compose. This toggle is disabled by default.

Test Connection

After specifying the connector settings, under Test Connection, click Test to validate network connectivity.

Metadata Extraction

You can configure metadata extraction (MDE) for an OCF data source on the Metadata Extraction tab of the Settings page. Refer to Configure Metadata Extraction for OCF Data Sources for information about the available configuration options.

The default queries that the connector uses to extract metadata can be found in Extraction Queries for Greenplum. You can customize these queries to adjust the extraction to your needs.

Compose

For details about configuring the Compose tab of the Settings, refer to Configure Compose for OCF Data Sources.

Sampling and Profiling

Sampling and profiling is supported. For details, see Configure Sampling and Profiling for OCF Data Sources.

Query Log Ingestion

The Greenplum data source supports table-based or query-based QLI.

Before you can set up QLI in Alation, perform the configuration on the Greenplum database as is described below.

Note

The parameter min_query_time sets to log queries that run longer than this value. The default is 20 seconds, which is configurable at $MASTER_DATA_DIRECTORY/gpperfmon/conf/gpperfmon.conf. If you do not see many logs ingested, you may need to adjust this parameter.

Table-Based QLI Using gpperfmon

To use table-based QLI, create a view for QLI using the steps below.

  1. Create the gpperfmon database. For steps, refer to the corresponding Greenplum documentation, for example:

  2. Give <service_account_name> the permission to access gpperfmon.

    psql postgres
    GRANT CONNECT ON DATABASE gpperfmon TO <service_account_name>;
    \q
    
  3. In the public schema, create a QLI view using the example SQL below.

    CREATE VIEW public.alation_qli_view AS
      SELECT
        qh.ctime,
        qh.rows_out,
        CASE
          WHEN qh.tfinish IS NOT NULL AND qh.tstart IS NOT NULL
          THEN extract(epoch FROM qh.tfinish - qh.tstart)
        ELSE NULL
        END AS seconds,
        qh.tstart AS startTime,
        qh.cpu_elapsed,
        CASE
          WHEN qh.status = 'abort'
          THEN true
        ELSE null
        END AS cancelled,
        qh.ssid || cast(qh.tmid as varchar(10)) AS sessionId,
        qh.ccnt,
        qh.username AS userName,
        qh.application_name,
        qh.query_text AS queryString,
        qh.tsubmit AS sessionStartTime,
        qh.db AS defaultDatabases,
      FROM queries_history qh
      WHERE
        lower(username) NOT IN ('gpmon')
         AND query_text NOT ILIKE '%jdbc_savepoint%'
         AND query_text NOT ILIKE 'select current_schema()';
    
  4. On gpperfmon, give <service_account_name> permissions for the QLI view.

    psql gpperfmon
    GRANT USAGE ON SCHEMA public TO <service_account_name>;
    GRANT SELECT ON TABLE alation_qli_view TO <service_account_name>;
    \q
    

To configure table-based QLI in Alation:

  1. Go to the Query Log Ingestion tab of the Settings page of your OCF data source.

  2. Under Connector Settings > Query Extraction, in the Table Name field, specify the name of the QLI view where the query logs are available. Ensure that the service account has the permissions to access this view. The view name must be provided in the format database.schema.view_name.

  3. Click Save.

Custom Query-Based QLI Using queries_history in gpperfmon

If you cannot create the view for QLI, for example, due to access restrictions, you can use a custom QLI query to extract query history into Alation. If you opt for query-based QLI, Alation will query the system table storing query history every time you manually run QLI or when the QLI job runs on schedule.

For custom query-based QLI to succeed, ensure that the service account has enough permissions to query the queries_history table in gpperfmon.

The template for the QLI query is given below. You can customize it by adding, removing, or changing the filter, but the columns and their aliases must remain as is since Alation expects this query structure.

Note

When using the QLI query template, do not substitute the STARTTIME and ENDTIME parameters in the WHERE filter. These parameters are not actual column names and should stay as is. They are expected by the connector and will be substituted with the start and end date of the QLI range selected in the user interface when QLI is run manually or on schedule.

To configure query-based QLI:

  1. Go to the Query Log Ingestion tab of the Settings page of your OCF data source.

  2. Under Connector Settings > Query Extraction, in the Custom QLI Query field, specify the QLI query.

  3. Click Save.

QLI Query Template

SELECT
  qh.ctime,
  qh.rows_out,
  CASE
    WHEN qh.tfinish IS NOT NULL AND qh.tstart IS NOT NULL
    THEN extract(epoch FROM qh.tfinish - qh.tstart)
  ELSE NULL
  END AS seconds,
  qh.tstart AS startTime,
  qh.cpu_elapsed,
  CASE
    WHEN qh.status = 'abort'
    THEN true
  ELSE null
  END AS cancelled,
  qh.ssid || cast(qh.tmid as varchar(10)) AS sessionId,
  qh.ccnt,
  qh.username AS userName,
  qh.application_name,
  qh.query_text AS queryString,
  qh.tsubmit AS sessionStartTime,
  qh.db AS defaultDatabases
FROM queries_history qh
  WHERE
    lower(username) NOT IN ('gpmon')
    AND query_text NOT ILIKE '%jdbc_savepoint%'
    AND query_text NOT ILIKE 'select current_schema()'
    AND DATE(startTime) >= DATE(STARTTIME)
    AND DATE(startTime) <= DATE(ENDTIME)
    AND queryString IS NOT NULL
    AND queryString <> ''
ORDER BY
  sessionid,
  starttime;

Perform QLI

You can either perform QLI manually on demand or enable automated QLI:

  1. To perform manual QLI, under the Automated and Manual Query Log Ingestion section of the Query Log Ingestion tab, ensure that the Enable Automated Query Log Ingestion toggle is disabled.

    Note

    Metadata extraction must be completed first before running QLI.

  2. Click Preview to get a sample of the query history data to be ingested.

  3. Click the Import button to perform QLI on demand.

  4. To schedule QLI, enable the Enable Automated Query Log Ingestion toggle.

  5. Set a schedule under Automated Query Log Ingestion Time by specifying values in the week, day, and time fields. The next QLI job will run on the schedule you have specified.

    Note

    The hourly schedule for automated QLI is not supported.

Troubleshooting

Refer to Troubleshooting.