AWS S3 Connector: Overview

Available from release 2022.3.5

Overview

The AWS S3 OCF Connector is developed by Alation and is available on demand. Create a ticket with Alation Support about receiving the S3 connector package.

This connector should be used to catalog AWS S3 as a file system source in Alation. The connector can be used for the following activities:

  • Metadata Extraction: The connector catalogs AWS S3 objects such as bucket and the content of buckets such as files and folders inside a bucket. It enables end users to discover, search, browse and curate AWS S3 objects as files and folders from the Alation user interface.

  • Schema Extraction: Connector extracts and catalogs columns or column headers for semi-structured file formats. Currently supported for Parquet, CSV, PSV, and TSV file formats. Stewards/users can search and curate the cataloged columns for each file. This is a time-intensive operation as it involved reading individual files.

  • Sampling: End-user initiated on-demand file sampling enables users to view randomly sampled rows of data in the file which provides a deeper insight into the file’s data and format.

Team

The following administrators are required to install this connector:

  • Alation Server Admin:

    • Validates the availability of Alation Connector Manager or installs it

    • Installs the connector

    • Adds and configures the S3 file system source in Alation

  • AWS S3 user with the following administrator privileges:

    • S3 configuration such as Inventory and Lambda setup

    • User creation for the S3 OCF connector

Scope

The table below shows what features are covered by the connector. For version support information, refer to Support Matrix.

Browser

AWS S3

Core Capabilities

Automated metadata extraction (MDE)

Column Extraction

Query-based MDE

Search

Catalog page curation

Catalog sets

Propagation of trust flags

Popularity

Sampling and Profiling

File sampling (Basic authentication)

File sampling (STS authentication)

Custom query-based table sampling

Custom query-based column profiling

Dynamic profiling

Authentication

Basic (AWS Access Key and Secret Key)

AWS STS Authentication

SSL

Kerberos

Keytab

LDAP

Technical Metadata

Files

Attributes/Columns

✔ *

* File sampling is supported only for CSV and Parquet file formats.