Monitor Component Health

Alation monitors the health of its components to determine whether they are running and working correctly. The status of these functional checks is available to Server Admins in the Monitor section of the Admin Settings page.

This feature requires pre-configuration.

Enabling Health Checks

Prerequisite: To enable Health Checks, you need sudo access to the host server.

  1. In the command Terminal, enter the Alation shell.

    sudo service alation shell
    
  2. Set the feature flag by running the following command:

    /opt/alation/ops/actions/alationadmin/enable_datadog
    

This configuration will make the Health Checks tab visible in Alation UI.

Viewing Alation Health Status

To check on the health of your Alation instance,

  1. Sign in to Alation as a Server Admin, and in the upper-right corner of the main toolbar, click the Admin Settings icon. The Admin Settings page will open.

  2. In the Monitor section, click Health Checks. The Health Checks tab will open.

../../_images/Screen_Shot_2018-11-16_at_4.06.33_PM.png

Alation monitors the performance of the following components:

Status

Description

Description

Postgres

Query Period

Checks the running time of queries (threshold: 60 min). It is unhealthy for a query to run longer than 60 min and may be indicative of a problem. If the threshold of 60 min is exceeded, the check will throw a warning.

Connection

Checks if connection to the PostgreSQL is successful.

TaskServer

Alive

Checks if connection to TaskServer is alive.

Redis

Connection

Checks if connection to Redis can be established.

Connector

Response

Checks if Compose Connector is responding to requests.

Elasticsearch

Shards

Checks if all Shards are active in Elasticsearch

Connection

Checks if the agent can connect to Elasticsearch to collect metrics.

Mongo

Response

Checks if Mongo is connected and responds to requests. Applies to versions below V R5 (5.9.x)

For more details on Alation components, see Runbook for Administrators.

The Health Check tab will display one of the three Statuses on each of the checks and a clarifying status message:

  • Success The component is performing correctly.

    ../../_images/MonitorSuccess.png
  • Warning The component is running with issues. Refer to the warning message for details.

    ../../_images/MonitorWarning.png
  • Failure  - There are errors in performance of the component. Refer to the error message for details and troubleshooting clues.

    ../../_images/MonitorFailure.png