In this topic, we described about the Load Data to Table with detailed example.
Loading data to the table is the objective of using the database to get optimized results. The LOAD DATA statement used to load the table. LOAD DATA process for table by moving a data file or all the data files in a directory from an HDFS location into the data directory.
The loaded data files are moved into the data directory. User can specify the HDFS path of a single file, or a directory to move all the files inside that directory. All the files to be loaded should be at top level and no nested directories allowed.
The LOAD DATA statement imports files from HDFS only. The LOAD DATA statement not imports the files from the local file system. LOAD DATA does not support the LOCAL keyword of the Hive LOAD DATA statement.
LOAD DATA needs a path, not a HDFS URL. LOAD DATA only verifies the limited error checking is done. If the loaded files have the wrong file format, different columns or other kind of mismatch, it does not raise any error for the LOAD DATA statement.
Querying the table afterward could produce a runtime error or unexpected results. The PARTITION clause is especially convenient for ingesting new data for a partitioned table.
LOAD DATA INPATH 'file or directory_path'
[OVERWRITE] INTO TABLE table_name [PARTITION
(partcol1=val1, partcol2=val2 ...)]
OVERWRITE clause is used to overwrite the data in the table. PARTITION is optional.
Assume we have a requirement to create student information with the columns student number, name, class, grade for a school.
The above the data is in the file std_details.txt in the location usr/data/. The below command is used to load the data into the std_details from the file usr/data/std_details.txt
hive> LOAD DATA LOCAL INPATH
OVERWRITE INTO TABLE std_details;
After successful execution of the above statement, the data will appear in std_details table.