HCatalog Partitions

HCatalog helps organize tables by dividing them into parts called partitions. Partition breaks a table into sections based on the value of a specific column. This makes it easier and faster to retrieve data when running queries.

Partitions can be further divided into buckets, which provide an additional structure to the data, allowing for more efficient queries. Partitions are defined during table creation using the PARTITION BY clause, which requires a list of column definitions.

Adding a partition -

A partition can be added during table creation or added to an already created table using alter. Now let us see how the partition can be defined during creation.

Syntax: Adding a partition while table creation

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  [(col_name data_type [COMMENT 'col_comment'], ...)]
  [COMMENT 'table_comment']
  [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
  [WITH SERDEPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
  [[ROW FORMAT row_format] [STORED AS file_format]]
  [LOCATION 'hdfs_path']
  [TBLPROPERTIES ('key1'='value1', 'key2'='value2', ...)]
  [CACHED IN 'pool_name']

Example:

Assume we have a requirement to create std_details_i table with std_class as a partitioned parameter.

hcat -e "CREATE TABLE std_details_i 
	(std_no INT, std_name STRING,std_class INT, std_grade STRING)
	COMMENT ‘Student Information’
	PARTITIONED BY (std_class INT);
	ROW FORMAT DELIMITED
	FIELDS TERMINATED BY ‘\t’
	LINES TERMINATED BY ‘\n’
	STORED AS TEXTFILE;"

Once the above statement successfully executed, the std_db.std_details_i table created successfully with partition.

Adding a partition after table creation -

Std_details have already been created without declaring any partition column. The partition column can be added using the ALTER TABLE command as shown below -

Syntax -

ALTER TABLE name [PARTITION (partition_spec)]
  	SET { FILEFORMAT PARQUET | TEXTFILE | RCFILE | SEQUENCEFILE | AVRO
  	| LOCATION 'hdfs_path_of_directory'
  	| TBLPROPERTIES (table_properties)
  	| SERDEPROPERTIES (serde_properties) }

Example -

Below command to add the partition to the table std_details_i already created earlier.

hcat -e "ALTER TABLE std_details_i 
 ADD PARTITION (std_class='1');"

Once the above statement successfully executed, the partition added to std_db.std_details table. Similarly, multiple partitions for each class can be set by using ADD PARTITION.

Dropping a partition -

The following syntax is used to drop a partition that already exists on the table.

Syntax -

ALTER TABLE table_name DROP [IF EXISTS] 
PARTITION partition_spec, PARTITION partition_spec;

Example -

Below command to drop the partition already created -

hcat -e "ALTER TABLE std_details DROP [IF EXISTS]
    PARTITION (std_class='1');"

Once the above statement successfully executed, the partition deleted on std_db.std_details table.