Amazon S3 Connector: Overview

Available from release 2022.3.5

Overview

The OCF connector for Amazon S3 is developed by Alation.

To download the Amazon S3 OCF connector package, go to the Alation Connector Hub available from the Customer Portal. Go to Customer Portal > Connectors > Alation Connector Hub. Only Alation users with access to the Customer Portal can access the Alation Connector Hub. If you don’t have access to the Customer Portal, contact Alation Support.

This connector should be used to catalog S3 as a file system source in Alation. The connector can be used for the following activities:

  • Metadata Extraction: The connector catalogs S3 objects such as bucket and the content of buckets such as files and folders inside a bucket. It enables end users to discover, search, browse and curate S3 objects as files and folders from the Alation user interface.

  • Schema Extraction: Connector extracts and catalogs columns or column headers for semi-structured file formats. Currently supported for Parquet, CSV, PSV, and TSV file formats. Stewards/users can search and curate the cataloged columns for each file. This is a time-intensive operation as it involved reading individual files.

  • Sampling: End-user initiated on-demand file sampling enables users to view randomly sampled rows of data in the file which provides a deeper insight into the file’s data and format.

Team

The following administrators are required to install this connector:

  • Alation Server Admin:

    • Validates the availability of Alation Connector Manager or installs it

    • Installs the connector

    • Adds and configures the S3 file system source in Alation

  • Amazon S3 user with the following administrator privileges:

    • S3 configuration such as Inventory and Lambda setup

    • User creation for the S3 OCF connector

Scope

The table below shows what features are covered by the connector. For version support information, refer to Support Matrix.

Feature

Amazon S3

Core Capabilities

Automated metadata extraction (MDE)

Column Extraction

Query-based MDE

Search

Catalog page curation

Catalog sets

Propagation of trust flags

Popularity

Sampling and Profiling

File sampling (Basic authentication)

File sampling (STS authentication)

Custom query-based table sampling

Custom query-based column profiling

Dynamic profiling

Authentication

Basic (AWS Access Key and Secret Key)

AWS STS Authentication

SSL

Kerberos

Keytab

LDAP

Technical Metadata

Files

Attributes/Columns

✔ *

* File sampling is supported only for CSV and Parquet file formats.