Summary -

In this topic, we described about the below sections -

Table is nothing but the set of rows and columns that are logically related. In another words, table is nothing logical structure of storing data. Table is a collection of rows which are having the same columns or attributes. The SQL identifier used to qualify the table name is called as SCHEMA.

CREATE TABLE s used to create the table. Table definition must include the table name and the attributes of its columns. Tables have some features such as partitioning regularly associated with higher-end data warehouse systems.

Logically, each table has a structure based on the definition of its columns, partitions and other table properties. Physically, each table is associated with a directory in HDFS.

The table data consists of all the data files underneath that directory:

  • Internal tables, managed and use directories inside the designated work area.
  • External tables use arbitrary HDFS directories, where the data files are typically shared between different Hadoop components.

Internal Tables

Internal tables are created internally. These are default tables which are produced by CREATE TABLE statement. These also called as managed tables.

The internal tables will be stored in /user/Hive/warehouse/<tablename>. User can create data in internal tables by issuing INSERT or LOAD DATA statements.

To view the type of the table, use the below command.

Hive> DESCRIBE FORMATTED table_name.

If the table is internal table, the table type field will contain MANAGED_TABLE. The Location field displays the path of the table directory as an HDFS URI.

An ALTER TABLE statement to rename an internal table will move all data files are moved into the new HDFS directory for the table.

If add or replace on data using HDFS operations, the data gets refreshed. When issue a DROP TABLE statement, physically removes all the data files from the directory.

External Tables

In the External tables, the data files are typically shared between different Hadoop components. These are external tables which are produced by CREATE EXTERNAL TABLE statement. For External tables, user needs to provide the path explicitly.

When the table gets deleted, it will get deleted from Hive Shell. But not physically from the path. User can create data in internal tables by issuing INSERT or LOAD DATA statements. To view the type of the table, use the below command.

Hive> DESCRIBE FORMATTED table_name.

If the table is external table, the table type field will contain EXTERNAL_TABLE. The Location field displays the path of the table directory as an HDFS URI. An ALTER TABLE statement to rename an external table will move all data files are left in their original locations.

If add or replace on data using HDFS operations, the data will be refreshed. When issue a DROP TABLE statement, that removes the connection with the associated data files, but does not physically remove the underlying data. User can continue to use the data files with other Hadoop components and HDFS operations.

Examples:-

Switching the tables from internal to external, or from external to internal, by using the ALTER TABLE statement:

-- Switch a table from internal to external.
	ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
-- Switch a table from external to internal.
	ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='FALSE');