Populating a Virtual Data Source From a CSV File¶
Alation Cloud Service Applies to Alation Cloud Service instances of Alation
Customer Managed Applies to customer-managed instances of Alation
This article describes how to import metadata from a CSV source file into a VDS created for a relational database or a Hive database.
For information on Virtual Data Source API, refer to Upload a Virtual Data Source on Alation’s Developer Portal.
Today’s VDS capability includes basic technical metadata. The fields available for importing are:
Name
Table type (for tables)
Column type (for columns)
Importing Schemas, Tables, Columns, and Indices¶
You can import metadata in the CSV format to populate a Virtual Data Source. The CSV file must meet several requirements. Prepare your CSV source file observing the format described below.
Requirements for Head and Keys¶
The file must include the head, declaring keys and properties, and then each line in the file should consist of a specific key along with the properties that apply, with empty values for properties that do not apply.
Head¶
Sample file head: key, table_type, column_type
Note that table_type `` can be ``VIEW
or TABLE
.
Key¶
Alation determines the data object type based on how many dots there are in key:
schema
is a schemaschema.table
is a tableschema.table.column
is a columnschema.table.index
is recognized as index if the key is accompanied by the property index_type.
If your database type supports multipart schema (for example, this is the case for SQL Server, Amazon Redshift, and Netezza), then the key structure must be as follows:
dbname.schema
is a schemadbname.schema.table
is a tabledbname.schema.table.column
is a columndbname.schema.table.index
is an index if the key is accompanied by the property index_type.
For more details on key format, see Keys with Special Characters on Alation’s Developer Portal.
Sample CSV file
"key","table_type","column_type"
"test_schema","",""
"test_schema.table1","TABLE",""
"test_schema.table2","VIEW",""
"schema.table1.column1","","varchar(10)"
"schema.table2.column1","","varchar(10)"
Adding Table and View Properties¶
Along with the key, you can also add the following properties, all of them optional.
Property |
Description |
Example |
---|---|---|
|
Specifies the type of the table. table_type is optional but recommended, and can be set to VIEW or TABLE. If not specified, it is assumed to be TABLE. |
TABLE |
|
A URI or file path to the location of the underlying data, such as, for example, an HDFS URL for a Hive table. Use this parameter only when the table_type is TABLE. |
hdfs:///user/hive/warehouse/table_a |
|
Name of the database account that owns this table. |
alation |
|
CREATE TABLE statement which was used to create the table. Use this parameter only when the table_type is TABLE. |
CREATE TABLE schema_a.table_a(column1int); |
|
CREATE VIEW statement which was used to create the view. Use this condition; parameter only when the table_type is VIEW. |
CREATE VIEW view_name AS SELECT column1, column2 FROM table_name WHERE |
|
CREATE VIEW statement with fully qualified object references specific to the Hive database type. Note that the database names should use the escaping symbols of back quote. Use this parameter only when the table_type is VIEW in a Hive database. |
CREATE VIEW view_name AS SELECT test_table.`column1`, test_table.`column2` FROM schema.`test_table` WHERE condition; |
|
Constraint statements enforced by the DB. Use this parameter only when the table_type is TABLE. |
column_a UNIQUE |
|
Timestamp at which the table or view was created. Ensure the timestamp is in UTC and follows the format: YYYY-MM-DDThh:mm:ssZ |
2018-03-13T22:09:33Z |
|
Timestamp of the last ALTER statement executed against this table. Ensure the timestamp is in UTC and follows the format: YYYY-MM-DDThh:mm:ssZ |
2018-03-13T22:09:33Z |
|
An array of columns |
[‘column1’, ‘column2’] |
|
An array of columns used to bucket the table. |
[‘column1’, ‘column2’] |
|
An array of columns used to sort the table. |
[‘column1’, ‘column2’] |
|
An array of other names that can be used to refer to this table. Each synonym is represented as a JSON comprising a schema_name and table_name. |
[{‘schema_name’: ‘schema_a’,’table_name’: ‘table_a’}, {‘schema_name’: ‘schema_b’,’table_name’: ‘table_b’}] |
|
A JSON of the skew column names to an array of their respective skewed column values that appear often. |
{‘column1’: [‘column1_value1’, ‘column1_value2’], ‘column2’: [‘column2_value1’, ‘column2_value2’]} |
|
A comment field that stores a description of the table which is ingested from the source system. |
Created by DB |
Note
You can have newline in the text values of the fields by adding a newline wherever needed. For example, newline can be inserted after AS:
CREATE VIEW view_name AS
SELECT column1, column2 FROM table_name WHERE condition;
Note how for non-primitive data types, single quotes are used to enclose the string in the data type. For example,
['column1', 'column2']
Adding Column Properties¶
For columns, you can specify the following properties, all of them optional:
Property |
Description |
Example |
---|---|---|
|
This property is optional but recommended and can be any string. |
map<int, string> |
|
Position of the column in the table which has it. 1. This value needs to be a positive integer. 2. If unspecified, the value defaults to the order in which it was uploaded. |
5 |
|
A comment field that stores a description of the column which is ingested from the source system. |
<has a default value> |
|
Field to indicate
if the column can
be nullable. Set
this to |
true |
Adding Indices¶
For indices, you can specify the following properties.
Property |
If Required |
Description |
---|---|---|
index_type |
yes The presence of this field distinguishes index object from a column. |
This property is an enum. The value for this property can be one of:
Example: PRIMARY |
column_names |
yes |
An array of column names on which the index is defined. If the index is composite, this array will have multiple column names.
Example: [“column1”] |
data_structure |
no |
The underlying data structure used by the index. The value for this field can be one of:
Default: NONE |
index_type_detail |
no |
A string having custom detailed information about the index. Example: MU LTI_COLUMN_STATISTICS |
is_ascending |
no |
Set this boolean to This is not valid for the composite index. |
filter_condition |
no |
Filter condition used while creating an index for part of the rows in the table. This is not valid for composite index. Example: ([filteredIndexCol]>(0)) |
is_foreign_key |
no |
Set this boolean to true if the index is a foreign key. When this is true, fields: foreign_key_table_name and foreign_key_column_names are required. |
foreign_key_table_name |
yes if
is_foreign_key is
set to |
The key of the parent table object which the foreign index refers to. This is required only if is_foreign_key is set to true. Make sure the table it corresponds to is already a part of the database metadata. Example: 7.schema_a.table_a |
foreign_key_column_names |
yes if
is_foreign_key is
set to |
An array of column names on the parent table object which the foreign index refers to.
Example: [‘column1’] |
Sample CSV
"key","table_type","column_type","index_type","column_names"
"public","","","",""
"public.customers","VIEW","","",""
"public.parts","TABLE","","",""
"public.parts.id","","varchar(40)","",""
"public.parts.index","","","SECONDARY","['id']"
Encoding and Special Characters¶
If the data has Unicode characters, make sure the file is encoded in UTF8. Alation will show a preview before you confirm importing.
If a property in the head does not apply to the key, leave it empty.
Alation does not support symbols
' " [ ]
in the names loaded using CSV. If you must have these symbols in the names, you will have to use a complex escaping sequence.
Adding, Updating, and Deleting Properties per Technical Metadata Object¶
Initially, you can upload only some of the properties for a key and
later update the same key with new properties or update or delete an
existing property value. For example, you can do one upload to add
view_sql
to all VIEWs and another upload to add ts_created
. This
second upload will retain the previously added view_sql
if you omit
that column.
Adding Properties¶
Let us discuss examples of importing a TABLE, COLUMN and INDEX objects and their properties. Suppose, the first import uploads only some of the properties as illustrated in the Sample CSV below:
Sample CSV:
"key","table_type","column_type","db_owner","definition_sql","constraint_text","ts_created","ts_last_altered","partitioning_attributes","synonyms","skews_info","table_comment","index_type","column_names","index_type_detail","is_ascending
"public","","","","","","","","","","","","","","",""
"public.parts","TABLE","","alation","create table schema_a.table_a(column1 int);","column_a UNIQUE","2018-03-13T22:09:33Z","2018-03-13T22:09:33Z","['column1', 'column2']","[{'schema_name': 'schema_a','table_name':'table_a'},{'schema_name': 'schema_b','table_name':'table_b'}]","{'column1': ['column1_value1', 'column1_value2'],'column2': ['column2_value1', 'column2_value2']}","Created by DB","","","","",""
"public.parts.id","","int","","","","","","","","","","","","","",""
"public.parts.index","","","","","","","","","","","","SECONDARY","['id']","BTREE","MULTI_COLUMN_STATISTICS","true"
You can specify such list properties as
partitioning_attributes
bucket_attributes
sort_attributes
synonyms
individually.
Sample CSV:
"key","table_type","bucket_attribute1","bucket_attribute2","sort_attribute1","partitioning_attribute1","synonym1","synonym2"
"public.parts","TABLE","bucket_column1","bucket_column2","sort_column1","partition_column1","{'schema_name':'schema_a','table_name':'table_a'}","{'schema_name': 'schema_b','table_name': 'table_b'}"
Updating Properties of an Existing Object¶
While creating the above TABLE object, we have not added
the data_location
property. We will next add that property and update
the value of the existing property db_owner
. For the COLUMN object, we
can also add position
and update column_type
. Similarly, for an
INDEX, we can add filter_condition
and update the existing property
data_structure
.
Sample update CSV:
"key","table_type","db_owner","data_location","position","index_type","filter_condition","data_structure"
"public.parts","TABLE","new_owner","hdfs:///user/hive/warehouse/parts","","","",""
"public.parts.id","","","","5","","",""
"public.parts.index","","","","","SECONDARY","([filteredIndexCol]>(0))","HASH"
All the properties mentioned while adding the object for the
first time are retained.
index_type
and is_foreign_key
cannot be changed after the index is
created.
foreign_key_table_name
and foreign_key_column_names
are required
if is_foreign_key
is part of the request. However, is_foreign_key
can be skipped to retain the existing foreign key reference
values.
Deleting Properties of an Existing Object¶
For already uploaded properties of an object, you can delete the properties by uploading the same object with the property value set to null.
However, table_type
, index_type
, column_names
, is_foreign_key
,
foreign_key_table_name
, and foreign_key_column_names
property
values cannot be deleted.
Sample CSV
"key","table_type","db_owner","data_location","position","index_type","data_structure","filter_condition"
"public.parts","","","","","",""
"public.parts.id","","","","","","",""
"public.parts.index","","","","","SECONDARY","",""
This upload removes the data_location
and db_owner
properties for
the TABLE object, removes column_type
and position
properties for
the COLUMN object, removes data_structure
and filter_condition
for
the INDEX object.
Uploading the CSV File¶
To import metadata from a CSV file,
Sign in to Alation, open the settings page of the Virtual Data Source you need to upload the metadata for, and open to the Import Metadata tab.
Upload or drag and drop the source CSV file into the upload area.
Review the uploaded file. Any formatting errors will appear in the Error column of the preview table.
In the top left corner of the uploaded table, click Confirm to confirm upload.
Refresh Job History. When the import is completed, the corresponding completed Metadata Extraction (MDE) job will be listed in the Job History table.
Navigate to the catalog page of the Virtual Data Source to view the imported metadata.