- Testing with Dummy data – Check here
-
Beeline doesnt honor tabs, if you are using any editors, you can replace tabs with space to maintain the structure and still use beeline effectively.
Ex: CREATE TABLE IF NOT EXISTS default.test1 (id INT,name STRING); -- this will fail Hive will throw an error saying "Error: Error while compiling statement: FAILED: ParseException line ... cannot recognize input near 'idINT' ..." This should be fixed in the latest version of Hive 2.1.0 -
Include ‘IF NOT EXISTS’ in your create statements. Prevents an error from occurring if the table exists.
CREATE TABLE IF NOT EXISTS … CREATE VIEW IF NOT EXISTS … - Include ‘IF EXISTS’ in your drop statements
DROP TABLE IF EXISTS PURGE; DROP VIEW IF EXISTS ; - If multiple columns are concatenated by same separated use CONCAT_WS instead of CONCAT
CONCAT_WS('---', column1, column2, column3) - Joins
Place the bigger table first in the join querySELECT idOne, idTwo, value FROM bigTable JOIN smallTableOne ON (bigTable.idOne = smallTableOne.idTwo) ; - Analyze statement
If we are re-using the table, its good practice to run analyse at the end of insert all data to the table. Ex:-- Non-partitioned table ANALYZE TABLE ${TABLE_SAMPLE1} COMPUTE STATISTICS; -- Partitioned table ANALYZE TABLE ${TABLE_SAMPLE1} PARTITION (year, month) COMPUTE STATISTICS; - Set job name
SET mapreduce.job.name = My first job - Source1; - Set queue name
SET mapreduce.job.queuename=root.queuename1; - Set staging dir to save intermediate results
-- This is HDFS location SET hive.exec.stagingdir=/tmp/staging; - Set hive variables
Whenever we have a dynamic element in our query, its a good practice to set that as a variable
Ex: year of interest, month, table name, table location, etc.Always end the variable declaration with semicolon ‘;’
-- Comment goes here. Always comment one line above the variable, dont comment at the end of the line, beeline scipts ran as a file does not honor the comment at the end of the line SET hivevar:DB_NAME=default; SET hivevar:TABLE_SAMPLE1=test1; SET hivevar:HDFS_LOGICAL_NAME=hdfs://quickstart.cloudera; SET hivevar:LOCATION_SAMPLE1='${HDFS_LOGICAL_NAME}/user/cloudera/training/test1'; SET hivevar:YEAR_OF_ANALYSIS=2005; -- To verify the value of the variable, you can select that variable as part of testing SELECT "${DB_NAME}" , "${TABLE_SAMPLE1}" , "${LOCATION_SAMPLE1}" , "${YEAR_OF_ANALYSIS}"; -- Output: +----------+--------+------------------------------------------------------------+-------+--+ | _c0 | _c1 | _c2 | _c3 | +----------+--------+------------------------------------------------------------+-------+--+ | default | test1 | 'hdfs://quickstart.cloudera/user/cloudera/training/test1' | 2005 | +----------+--------+------------------------------------------------------------+-------+--+ Example: DROP TABLE IF EXISTS ${DB_NAME}.${TABLE_SAMPLE1} PURGE; CREATE TABLE IF NOT EXISTS ${DB_NAME}.${TABLE_SAMPLE1} ( id INT ,name STRING ,year_of_join STRING ) STORED AS TEXTFILE LOCATION ${LOCATION_SAMPLE1} ; !sh date INSERT INTO TABLE ${DB_NAME}.${TABLE_SAMPLE1} VALUES (1, 'puneetha' , 2016) ,(2, 'bhoomika' , 2005) ; !sh date !sh date -- Use the variables in below query SELECT * FROM ${DB_NAME}.${TABLE_SAMPLE1} WHERE year_of_join = ${YEAR_OF_ANALYSIS}; !sh date -- Output: +-----------+-------------+---------------------+--+ | test1.id | test1.name | test1.year_of_join | +-----------+-------------+---------------------+--+ | 2 | bhoomika | 2005 | +-----------+-------------+---------------------+--+ -
To initialize the variable, leave it blank, this way the variable will be recognized but replaced with nothing
-- With Variable values SET hivevar:HDFS_LOGICAL_NAME=hdfs://quickstart.cloudera; SET hivevar:LOCATION_SAMPLE1='${HDFS_LOGICAL_NAME}/user/cloudera/training/test1'; -- To verify the value of the variable, you can select that variable as part of testing SELECT "${LOCATION_SAMPLE1}"; Output: +----------------------------------------------------+--+ | _c0 | +----------------------------------------------------+--+ | 'hdfs://quickstart.cloudera/user/cloudera/training/test1' | +----------------------------------------------------+--+ -- With empty value, only to recognize variable name: SET hivevar:HDFS_LOGICAL_NAME=; SET hivevar:LOCATION_SAMPLE1='${HDFS_LOGICAL_NAME}/user/cloudera/training/test1'; -- To verify the value of the variable, you can select that variable as part of testing SELECT "${LOCATION_SAMPLE1}"; Output: +----------------------------------+--+ | _c0 | +----------------------------------+--+ | '/user/cloudera/training/test1' | +----------------------------------+--+ - Change a managed table to an external table with “TRUE” and vice versa with “FALSE”.
TBLPROPERTIES (“EXTERNAL”=”TRUE”) in release 0.6.0+Ex: ALTER TABLE ${DB_NAME}.${TABLE_SAMPLE1} SET TBLPROPERTIES ("EXTERNAL"="TRUE"); - Drop a specific partition
ALTER TABLE db_name.table_name DROP IF EXISTS PARTITION (year=2014, month=5) PURGE; - Print comments
!sh echo "This is my first query" - Save result set
!record -- Your query !record - Run SQL file
!run - Print date
!sh date - Print all configurations
--There will be too many configurations dumped to the screen. If you really intend to capture all the settings, save the result set as pointed in point 1 SET -v;