Summary -

Before proceeding with Hive installation, Hadoop should be installed first. Refer Hadoop installation steps here

Step-1: Apache Hive installation

Below are the steps to install Apache Hive -

Download the Apache Hive

Start downloading the Hive most recent stable release from Apache download mirrors link: https://hive.apache.org/downloads.html

And the mirror site suggested http://mirror.fibergrid.in/apache/hive/

To download from Hadoop use the below command

$ wget http://mirror.fibergrid.in/apache/hive
 //hive-x.y.z/hive-x.y.z.tar.gz

Or use the below method to find the version after download from webpage.

$ cd Downloads
$ ls

ls command will display the downloaded file apache-hive-x.y.z-bin.tar.gz

Installing the Apache Hive

  1. Verify and Untar the archive file downloaded.

    The below command is used to verify and untar the downloaded file.

    $ tar -xzvf hive-x.y.z.tar.gz
    $ ls

    ls command will display the downloaded file and untar file.

    apache-hive-x.y.z-bin
    apache-hive-x.y.z-bin.tar.gz
  2. Copy the files to /usr/local/hive directory

    $ cd /home/user/Download
    $ mv apache-hive-x.y.z-bin /usr/local/hive
  3. Set the environment variable HIVE_HOME to point to the installation directory

    $ cd hive
    $ export HIVE_HOME=/usr/local/hive
  4. Add $HIVE_HOME/bin to your PATH

    $ export PATH=$HIVE_HOME/bin:$PATH
  5. Add $CLASSPATH to libraries

    export CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*:.
    export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.

Configuring the Apache Hive

To configure Hive with Hadoop, change hive-env.sh file in $HIVE_HOME/conf directory. Go to the directory

$ cd $HIVE_HOME/conf
$ cp hive-env.sh.template hive-env.sh

Edit the hive-env.sh to append the below.

export HADOOP_HOME=/usr/local/hadoop

Now Hive installation got successfully completed. Hive requires an external database server to configure metastore.

Step-2: External Database Server installation

Download the Apache Derby distribution from the Derby web site at http://db.apache.org/derby/derby_downloads.html. Always download the latest version of the Apache Derby distribution form the Derby website.

The below are the latest version of Apache Derby based on the operating system while creating this tutorial.

Operating SystemDownload File
Windowsdb-derby-10.12.1.1-bin.zip
UNIX, Linux, and Macdb-derby-10.12.1.1-bin.tar.gz

Installing Derby

Choose the directory with write permissions for the user to install the Derby software. The below is installation procedure for windows and unix, linux and mac separately.

WindowsUnix, linux and mac
mkdir C:\Apache
copy db-derby-10.
12.1.1-bin.zip
 C:\Apachecd C:\Apache
unzip db-derby-10.
12.1.1-bin.zip
$ mkdir /opt/Apache
cp db-derby-10.12.1.1-bin.tar.gz
 /opt/Apache cd /opt/Apache
tar xzvf db-derby-10.
12.1.1-bin.tar.gz

Set DERBY_INSTALL

Set the DERBY_INSTALL variable to the location where Derby installed.

WindowsUnix, linux and mac
C:\> set DERBY_INSTALL=
C:\Apache\db-derby-10.12.1.1-bin
$ export DERBY_INSTALL=
/opt/Apache/db-derby-10.
12.1.1-bin

Configure Embedded Derby

To use Derby in embedded mode, set CLASSPATH to include the jar files Derby.jar, Derbytools.jar.

  • derby.jar: contains the Derby engine and the Derby Embedded JDBC driver
  • derbytools.jar: optional
WindowsUnix, linux and mac
C:\> set CLASSPATH=
%DERBY_INSTALL%\lib\derby.
jar;%DERBY_INSTALL%\lib
\derbytools.jar;
$ export CLASSPATH=
$DERBY_INSTALL/lib/derby.
jar:$DERBY_INSTALL/
lib/derbytools.jar:

Change directory into the DERBY_INSTALL/bin directory. For Derby embedded usage, the setEmbeddedCP.bat (Windows) and setEmbeddedCP (UNIX) scripts use the DERBY_INSTALL variable to set the CLASSPATH.

WindowsUnix, linux and mac
C:\> cd %DERBY_INSTALL%\bin
C:\Apache\db-derby-10.
12.1.1-bin\bin> 
setEmbeddedCP.bat
$ cd $DERBY_INSTALL/bin
$.setEmbeddedCP

Verify Derby

Echo CLASSPATH to double check each entry in class path to verify that the jar file where it expected:

WindowsUnix, linux and mac
C:\> echo %CLASSPATH% 
C:\Apache\DB-DER~1.1-B
\lib\derby.jar;C:\Apache
\DB-DER~1.1-B\lib
\derbytools.jar;
$ echo $CLASSPATH
/opt/Apache/db-derby-10.
12.1.1-bin/lib/derby.jar:/opt
/Apache/db-derby-10.
12.1.1-bin/lib/derbytools.jar:

Step-3: Configuring Hive metastore

Metastore need to be configured to specify where the database is stored. This needs a change in hive-site.xml filewhich is in the $HIVE_HOME/conf directory. As a first step, the template file need to be copied using the below command:

$ cd $HIVE_HOME/conf
$ cp hive-default.xml.template hive-site.xml

Edit hive-site.xml and append the following lines between the <configuration> and </configuration> tag

Path: /opt/hadoop/hive/conf/hive-site.xml
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:derby://hadoop1:1527/metastore_db;create=true
			</value>
  <description>JDBC connect string for a JDBC metastore
			</description>
</property> 
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.apache.derby.jdbc.ClientDriver</value>
  <description>Driver class name for a JDBC metastore
			</description>
</property>

JPOX properties can be specified in hive-site.xml. Changes required in the below file.

Path: /opt/hadoop/hive/conf/jpox.properties

javax.jdo.PersistenceManagerFactoryClass=org.
 jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema=false
org.jpox.validateTables=false
org.jpox.validateColumns=false
org.jpox.validateConstraints=false
org.jpox.storeManagerType=rdbms
org.jpox.autoCreateSchema=true
org.jpox.autoStartMechanismMode=checked
org.jpox.transactionIsolation=read_committed
javax.jdo.option.DetachAllOnCommit=true
javax.jdo.option.NontransactionalRead=true
javax.jdo.option.ConnectionDriverName=
 org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL=jdbc:derby:
 //hadoop1:1527/metastore_db;create=true
javax.jdo.option.ConnectionUserName=APP
javax.jdo.option.ConnectionPassword=mine

Step-4: Running and Verifying Hive

Hadoop must be installed in the path or cap the Hadoop by using below command.

export HADOOP_HOME=<hadoop-install-dir>

Create /tmp and /user/hive/warehouse and set them chmod g+w in HDFS before a table create in Hive.

Commands to perform this setup:

$ $HADOOP_HOME/bin/hadoop fs -mkdir       /tmp
  $ $HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive/warehouse
  $ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp
  $ $HADOOP_HOME/bin/hadoop fs -chmod g+w   /user/hive/warehouse

Below commands are used to verify the Hive installation

$ cd $HIVE_HOME
$ bin/hive

After successful login the Hive prompt will be shown.

hive>

The metastore will not be created until the first query hits it. so trigger the below query.

hive> show tables;
OK

Now you can run multiple Hive instances working on the same data simultaneously and remotely.