HCatalog Create Table
CREATE TABLE Statement -
The CREATE TABLE statement is used to create a table. Below are the syntax options for creating a table in various ways -
Syntax -
CREATE [EXTERNAL] TABLE [IF NOT EXISTS]
[EXTERNAL] [db_name.]table_name
[LIKE reference_table_name]
[COMMENT 'table_comment']
[WITH SERDEPROPERTIES ('key1'='value1',
'key2'='value2', ...)]
[[ROW FORMAT row_format] [STORED AS file_format]]
[LOCATION 'hdfs_path']
[TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
[CACHED IN 'pool_name']
AS
select_statement
- CACHED IN clause - used to load any existing or future data files in the table directory or the partition subdirectories into memory using the HDFS caching mechanism.
- CREATE TABLE.. AS SELECT - This process involves creating a new table based on the column definitions from an existing table and transferring data from the source table to the destination table. It does not require the use of a separate INSERT statement.
- EXTERNAL and LOCATION - Optional. EXTERNAL clause used when user tries to create external table. An external table requires a LOCATION clause to specify the path to the HDFS directory to reads and writes files for the table.
- LIKE clause - creates an empty table with the same columns, comments, and other attributes as another existing table.
- PARTITIONED BY clause - divides the data files from one or more specified columns based on the values.
- WITH SERDEPROPERTIES clauses - takes a comma-separated list of key-value pairs.
Example -
Assume we have a requirement to create student information with the columns student number, name, class, grade for a school.
Column Name | Type |
---|---|
Std No | Int |
Std Name | String |
Std Class | Int |
Std Grade | String |
The following query creates the internal table named std_details_i using the above data -
./hcat e "CREATE TABLE std_details_i (std_no INT,
std_name STRING, std_class INT, std_grade STRING)
COMMENT 'Student Details'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;"
After successful execution of the above statement, std_details_i table created.
The following query creates the external table named std_details_i using the above data -
./hcat e "CREATE EXTERNAL TABLE std_details_e (std_no INT,
std_name STRING, std_class INT, std_grade STRING)
COMMENT 'Student Details'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS PARQUET LOCATION '/usr/hcatalog/std_db;"
After successful execution of the above statement, std_details_e table created.
LOAD DATA Statement-
After creating a table in SQL, data is typically inserted using the INSERT statement. However, in HCatalog, we use the LOAD DATA statement to insert data. When inserting data into HCatalog, it's advisable to use LOAD DATA for bulk record storage. There are two methods for loading data: one option is from the local file system, and the other is from the Hadoop file system.
Syntax -
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE]
INTO TABLE tablename
[PARTITION (partcol1=val1, partcol2=val2 ...)]
- LOCAL is an optional identifier to specify the local path.
- The optional OVERWRITE option allows for data in the table to be overwritten, and PARTITION is also optional.
Example -
Assume we have a requirement to load student information. It is a text file named student_information.txt in /home/user directory.
0001 Pawan 10 A 0002 Srinivas 10 A 0003 Sridhar 10 A 0004 Kumar 10 B
The following query loads the given text into the table -
./hcat e "LOAD DATA LOCAL INPATH
'/home/user/student_information.txt'
OVERWRITE INTO TABLE std_details_i;"
On successful download, the data inserted to the table and get the below response -
OK Time taken: 11.202 seconds