Hive Built-in Functions
Hive Built-in Functions are pre-defined operations that we can use directly in our Hive queries to:
- Perform mathematical calculations
- Manipulate strings (text)
- Work with dates and times
- Check conditions using logical operators
Mathematical Functions -
Hive provides several mathematical functions to help you perform calculations directly in your query.
| Function Name (Signature) | Description |
|---|---|
| round(double a) | returns the rounded BIGINT value of the double |
| floor(double a) | returns the maximum BIGINT value that is equal or less than the double |
| ceil(double a) | returns the minimum BIGINT value that is equal or greater than the double |
| rand(), rand(int seed) | returns a random number that changes from row to row |
Aggregate built-in functions -
The below are the list of aggregate functions:
| Aggregation Function Name (Signature) | Description |
|---|---|
| count(*), count(expr) | Returns the total number of retrieved rows, including rows containing NULL values; |
| sum(col), sum(DISTINCT col) | returns the sum of the elements in the group or the sum of the distinct values of the column |
| avg(col), avg(DISTINCT col) | returns the average of the elements in the group or the average of the distinct values of the column |
| min(col) | returns the minimum value of the column |
| max(col) | returns the maximum value of the column |
String Functions -
In Hive, String Functions help you work with text. You can change case, extract parts of text, or find out how long a string is.
| Function Name (Signature) | Description |
|---|---|
| concat(string A, string B,...) | returns the string resulting from concatenating B after A |
| substr(string A, int start) | returns the substring of A starting from start position till the end of string A |
| substr(string A, int start, int length) | returns the substring of A starting from start position with the given length |
| upper(string A) | returns the string resulting from converting all characters of A to upper case |
| ucase(string A) | Same as upper |
| lower(string A) | returns the string resulting from converting all characters of B to lower case |
| lcase(string A) | Same as lower |
| trim(string A) | returns the string resulting from trimming spaces from both ends of A |
| ltrim(string A) | returns the string resulting from trimming spaces from the beginning(left hand side) of A |
| rtrim(string A) | returns the string resulting from trimming spaces from the end(right hand side) of A |
| regexp_replace(string A, string B, string C) | returns the string resulting from replacing all substrings in B that match the Java regular expression syntax with C |
Date Functions -
Date and time handling is very common in data processing. Hive provides easy-to-use functions to work with dates, extract parts of dates, or calculate differences.
| Function Name (Signature) | Description |
|---|---|
| to_date(string timestamp) | Return the date part of a timestamp string: to_date("1970-01-01 00:00:00") = "1970-01-01" |
| year(string date) | Return the year part of a date or a timestamp string: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970 |
| month(string date) | Return the month part of a date or a timestamp string: month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11 |
| day(string date) | Return the day part of a date or a timestamp string: day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1 |
| get_json_object(string json_string, string path) | "Extract json object from a json string based on json path specified, and return json string of the extracted json object. It will return null if the input json string is invalid." |
Complex Functions -
| Function Name (Signature) | Description |
|---|---|
| size(Map | returns the number of elements in the map type |
| size(Array | returns the number of elements in the array type |
| cast(<expr> as <type>) | "converts the results of the expression expr to <type>. A null is returned if the conversion does not succeed." |
| from_unixtime(int unixtime) | convert the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the format of "1970-01-01 00:00:00" |
Examples -
Scenario1: Round a number
SELECT ROUND(15.678);
Result: 16
Scenario2: Convert text to uppercase
SELECT UPPER('hadoop');
Result: HADOOP
Scenario3: Get first 4 letters of a string
SELECT SUBSTR('DataEngineering', 1, 4);
Result: Data
Scenario4: Get today’s date
SELECT CURRENT_DATE;
Result: 2025-07-04 (for example)
Scenario5: Find the difference between two dates
SELECT DATEDIFF('2025-12-31', '2025-01-01');
Result: 364