Hive Cheat Sheet

By | August 29, 2014

Hive CLI will soon be deprecated in favor of Beeline.

Check here for beeline commands.

Enter into hive shell by typing


$hive
hive>

Hive Shell Commands

Description Command
Autocomplete hive> Press Tab key
Display all 436 possibilities? (y or n)
If you enter y, you’ll get a long list of all the keywords
Navigation Keystrokes Use the up↑ and down↓ arrow keys to scroll through previous commands
Ctrl+A goes to the beginning of the line
Ctrl+E goes to the end of the line
Delete key will delete the character to the left of the cursor
Command History Hive saves the last 100,00 lines into a file $HOME/.hivehistory
Shell Execution type ! followed by the command and terminate the line with a semicolon (;)
hive> ! /bin/echo "Hello World";
"Hello World"
hive> ! pwd;
/home/me/hiveplay

(Note: Don’t invoke interactive commands that require user input. Shell “pipes” don’t work and neither do file “globs.”
For example, ! ls *.hql; will look for a file named *.hql;, rather than all files that end with the .hql extension.)
To Print Current DB in use set hive.cli.print.current.db=true; (or)
set hiveconf:hive.cli.print.current.db=true;
To remove current db name display in hive shell set hiveconf:hive.cli.print.current.db=false;
Specifying Metastore location for each user set hive.metastore.warehouse.dir=/user/myname/hive/warehouse;
System Namespace (provides read-write access to Java system properties) set system:user.name; (or)
set system:user.name=yourusername;
env Namespace (provides read-only access to environment variables) set env:HOME;
Hadoop dfs commands inside Hive shell Exclude hadoop keyword and end the command with semicolon(;) as below:
hive> dfs -ls / ;
(Note: This method of accessing hadoop commands is actually more efficient than using the hadoop dfs … equivalent at the bash shell, because the latter starts up a new JVM instance each time, whereas Hive just runs the same code in its current process.)
Execute hive queries from a file source /unix-path/to/file/withqueries.hql;
Print Column Headers SET hive.cli.print.header=true;
Get columns names of the table SHOW COLUMNS FROM mytable;
Load data from a local file to the hive table LOAD DATA LOCAL INPATH '/unix-path/myfile' INTO TABLE mytable;
Load data from hdfs file to the hive table LOAD DATA INPATH '/hdfs-path/myfile' INTO TABLE mytable;
Data Types Numeric Data Types:

  • TINYINT
    (1-byte signed integer, from -128 to 127)
  • SMALLINT
    (2-byte signed integer, from -32,768 to 32,767)
  • INT
    (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)
  • BIGINT
    (8-byte signed integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
  • FLOAT
    (4-byte single precision floating point number)
  • DOUBLE
    (8-byte double precision floating point number)
  • DECIMAL (or) DECIMAL(precision, scale)
    (Precision of 38 digits. User definable precision and scale)

Date/Time Types:

  • TIMESTAMP
    (UTC time. Format ‘YYYY-MM-DD HH:MM:SS.fffffffff’ (9 decimal place precision) Ex: ‘2012-02-03 12:34:56.123456789’
  • DATE
    (Format: ‘YYYY-­MM-­DD’ The range of values supported for the Date type is be 0000-­01-­01 to 9999-­12-­31, dependent on support by the primitive Java Date type.)

String Types:

  • STRING
  • VARCHAR
    (Length specifier between 1 and 65355)
  • CHAR
    (Fixed-length. The maximum length is fixed at 255)

Misc Types:

  • BOOLEAN
  • BINARY

Complex Types:

  • arrays: ARRAY
  • maps: MAP
  • structs: STRUCT
  • union: UNIONTYPE



Hive One Shot commands

Description Command
To Print Current DB in use $hive --hiveconf hive.cli.print.current.db=true
Specify a file of commands for the CLI to run as it starts, before showing you the prompt $cat hiveproperties.txt
set hive.cli.print.current.db=true;
set system:user.name;

$ hive -i hiveproperties.txt
system:user.name=dc-user
hive (default)>

Adding the -e execute Hive queries $hive -e "SELECT * FROM mytable LIMIT 3";
Adding the -S for silent mode removes the OK and Time taken … lines, as well as other
inessential output
$hive -S -e "SELECT * FROM mytable LIMIT 3"
Useful trick for finding a property name that you can’t quite remember $hive -S -e "set" | grep warehouse_or_pattern
Comments in Hive Scripts. Hive scripts have the extension .hql $cat hivescript.hql
-- Comment line1
-- Comment line2
SELECT * FROM mytable LIMIT 3;
Executing Hive Queries from Files $hive -f /unix-path/to/file/hivescript.hql
Hive variables (The env namespace is useful as an alternative way to pass variable definitions to Hive) $YEAR=2012 hive -e "SELECT * FROM mytable WHERE year = ${env:YEAR}";

Comment below if you find this blog useful.

13 thoughts on “Hive Cheat Sheet

  1. Chandan Sharma

    Hello Puneetha, Nice work done by you…
    I am new to this field, learning BigData, I have done M.Tech from New Horizon College, Bangalore
    I saw your web page, u know lot of things, So if you can help me out with your knowledge I can also try to step into this field…
    Reply with comment or mail to me…

    Reply
  2. Deva

    Hi,

    Very good to see this cheat sheet. Very clean and simple. Keep posting.

    Thanks,
    Deva

    Reply
  3. manoj

    hello puneetha,
    i have one query if you can help me .
    when i run HQL from bash it returns entire log with map reduce process in console and also the error if we get it.But do u know how can i get SQL query as well in output console which executed by HQL file .
    i am able to redirect those process detail in some log files but unable to find which SQL failed.

    hope you got my query.

    Reply
    1. Gurumurthy CL

      cat $HOME/.hivehistory
      it works… Great command… try it again brother

      Reply
  4. Chandhana

    Awesome Work…Thanks a lot for this information…

    Reply
  5. chanaiah

    Hi Sir
    i cannot find ./hivehistory file in my hive home dir. How can i possibly find it.
    Can you please please help me.

    Reply
    1. puneetha Post author

      Search for .hivehistory in your machine with below command

      $ locate .hivehistory
      

      Ideally it should be there in your user directory Ex: /home/user1/.hivehistory

      Although you should be using beeline shell, instead of hive shell.

      Reply
  6. priyanka

    how to check whether hive.print.current.db is true or not. Can we check it?

    Reply
    1. puneetha Post author

      Syntax for printing any setting in Hive
      Ex: SET hive.print.current.db;
      This should print the value associated with the key/setting.

      Reply
  7. Sri

    Hi Puneetha, As others have already mentioned, this is a very good reference sheet on “hive”. I was wondering if you have created reference sheets on other topics. Appreciate your good work in sharing knowledge. Sri

    Reply
  8. Jeetendra Singh

    Hello Puneetha… went through the sheet and it is really concise and clear. I have a couple of questions and would appreciate if you could provide your valuable inputs.

    I am using orc format with partitioning . It results in too many files in the hive directory. I understand that the files produced is dependent on the data and if there is more data in that partition then more files will be created . I suspect that it will increase the response time. Pls suggest how should ai proceed.
    The second question I believe is regarding using environment variable like hive create table location. Can I set these variables in a configure file and refer to the variable. I would like to avoid using Hivevar or Conf

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *