Amazon S3 Connector: Overview¶
Available from release 2022.3.5
Overview¶
The OCF connector for Amazon S3 is developed by Alation.
To download the Amazon S3 OCF connector package, go to the Alation Connector Hub available from the Customer Portal. Go to Customer Portal > Connectors > Alation Connector Hub. Only Alation users with access to the Customer Portal can access the Alation Connector Hub. If you don’t have access to the Customer Portal, contact Alation Support.
This connector should be used to catalog S3 as a file system source in Alation. The connector can be used for the following activities:
Metadata Extraction: The connector catalogs S3 objects such as bucket and the content of buckets such as files and folders inside a bucket. It enables end users to discover, search, browse and curate S3 objects as files and folders from the Alation user interface.
Schema Extraction: Connector extracts and catalogs columns or column headers for semi-structured file formats. Currently supported for Parquet, CSV, PSV, and TSV file formats. Stewards/users can search and curate the cataloged columns for each file. This is a time-intensive operation as it involved reading individual files.
Sampling: End-user initiated on-demand file sampling enables users to view randomly sampled rows of data in the file which provides a deeper insight into the file’s data and format.
Team¶
The following administrators are required to install this connector:
Alation Server Admin:
Validates the availability of Alation Connector Manager or installs it
Installs the connector
Adds and configures the S3 file system source in Alation
Amazon S3 user with the following administrator privileges:
S3 configuration such as Inventory and Lambda setup
User creation for the S3 OCF connector
Scope¶
The table below shows what features are covered by the connector. For version support information, refer to Support Matrix.
Feature |
Amazon S3 |
---|---|
Core Capabilities |
|
Automated metadata extraction (MDE) |
✔ |
Column Extraction |
✔ |
Query-based MDE |
ㄨ |
Search |
✔ |
Catalog page curation |
✔ |
Catalog sets |
ㄨ |
Propagation of trust flags |
ㄨ |
Popularity |
ㄨ |
Sampling and Profiling |
|
File sampling (Basic authentication) |
✔ |
File sampling (STS authentication) |
ㄨ |
Custom query-based table sampling |
ㄨ |
Custom query-based column profiling |
ㄨ |
Dynamic profiling |
ㄨ |
Authentication |
|
Basic (AWS Access Key and Secret Key) |
✔ |
AWS STS Authentication |
✔ |
SSL |
ㄨ |
Kerberos |
ㄨ |
Keytab |
ㄨ |
LDAP |
ㄨ |
Technical Metadata |
|
Files |
✔ |
Attributes/Columns |
✔ * |
* File sampling is supported only for CSV and Parquet file formats.