AWS S3 Connector: Overview¶
Available from release 2022.3.5
Overview¶
The AWS S3 OCF Connector is developed by Alation and is available on demand. Create a ticket with Alation Support about receiving the S3 connector package.
This connector should be used to catalog AWS S3 as a file system source in Alation. The connector can be used for the following activities:
Metadata Extraction: The connector catalogs AWS S3 objects such as bucket and the content of buckets such as files and folders inside a bucket. It enables end users to discover, search, browse and curate AWS S3 objects as files and folders from the Alation user interface.
Schema Extraction: Connector extracts and catalogs columns or column headers for semi-structured file formats. Currently supported for Parquet, CSV, PSV, and TSV file formats. Stewards/users can search and curate the cataloged columns for each file. This is a time-intensive operation as it involved reading individual files.
Sampling: End-user initiated on-demand file sampling enables users to view randomly sampled rows of data in the file which provides a deeper insight into the file’s data and format.
Team¶
The following administrators are required to install this connector:
Alation Server Admin:
Validates the availability of Alation Connector Manager or installs it
Installs the connector
Adds and configures the S3 file system source in Alation
AWS S3 user with the following administrator privileges:
S3 configuration such as Inventory and Lambda setup
User creation for the S3 OCF connector
Scope¶
The table below shows what features are covered by the connector. For version support information, refer to Support Matrix.
Browser |
AWS S3 |
---|---|
Core Capabilities |
|
Automated metadata extraction (MDE) |
✔ |
Column Extraction |
✔ |
Query-based MDE |
ㄨ |
Search |
✔ |
Catalog page curation |
✔ |
Catalog sets |
ㄨ |
Propagation of trust flags |
ㄨ |
Popularity |
ㄨ |
Sampling and Profiling |
|
File sampling (Basic authentication) |
✔ |
File sampling (STS authentication) |
ㄨ |
Custom query-based table sampling |
ㄨ |
Custom query-based column profiling |
ㄨ |
Dynamic profiling |
ㄨ |
Authentication |
|
Basic (AWS Access Key and Secret Key) |
✔ |
AWS STS Authentication |
✔ |
SSL |
ㄨ |
Kerberos |
ㄨ |
Keytab |
ㄨ |
LDAP |
ㄨ |
Technical Metadata |
|
Files |
✔ |
Attributes/Columns |
✔ * |
* File sampling is supported only for CSV and Parquet file formats.