Hive – Timezone problem

By | November 15, 2017
Timezone problem – Any function which triggers mapreduce job, causes this problem, since it takes the local timezone of machine where it runs the mapper/reducer

In our case, lets say our servers are in German timezone i.e. CET


-- With original settings
SET system:user.country;
+-------------------------+--+
|           set           |
+-------------------------+--+
| system:user.country=GB  |
+-------------------------+--+

-- Original settings
SET mapreduce.map.java.opts=-Duser.timezone=CET;
SET mapreduce.reduce.java.opts=-Duser.timezone=CET;

SELECT
	FROM_UNIXTIME(time_field, 'yyyy-MM-dd HH:mm:ss z') as date_z_from_string
	,COLLECT_SET(FROM_UNIXTIME(time_field, 'yyyy-MM-dd HH:mm:ss z')) as date_z_collect_set
FROM (SELECT CAST('1333047109' AS BIGINT) AS time_field) t
GROUP BY time_field
;

-- This is not the output we want:
+---------------------------+-------------------------------+--+
|    date_z_from_string     |       date_z_collect_set      |
+---------------------------+-------------------------------+--+
| 2012-03-29 20:51:49 CEST  | ["2012-03-29 20:51:49 CEST"]  |
+---------------------------+-------------------------------+--+

Fixing the configuration to convert it to UTC always



-- Set this hive configuration, then it automatically converts it to UTC timezone, there wont be any code change needed.
SET mapreduce.map.java.opts=-Duser.timezone=UTC;
SET mapreduce.reduce.java.opts=-Duser.timezone=UTC;

SELECT
	FROM_UNIXTIME(time_field,'yyyy-MM-dd HH:mm:ss z') as date_z_from_string
	,COLLECT_SET(FROM_UNIXTIME(time_field,'yyyy-MM-dd HH:mm:ss z')) as date_z_from_array
FROM (SELECT CAST('1333047109' AS BIGINT) AS time_field) l
GROUP BY time_field
;

-- Expected output:
+--------------------------+------------------------------+--+
|    date_z_from_string    |      date_z_collect_set      |
+--------------------------+------------------------------+--+
| 2012-03-29 18:51:49 UTC  | ["2012-03-29 18:51:49 UTC"]  |
+--------------------------+------------------------------+--+

One thought on “Hive – Timezone problem

  1. Malu

    SET mapreduce.map.java.opts=-Duser.timezone=UTC;
    SET mapreduce.reduce.java.opts=-Duser.timezone=UTC;

    SELECT
    FROM_UNIXTIME(time_field,’yyyy-MM-dd HH:mm:ss z’) as date_z_from_string
    ,COLLECT_SET(FROM_UNIXTIME(time_field,’yyyy-MM-dd HH:mm:ss z’)) as date_z_from_array
    FROM (SELECT CAST(‘1333047109′ AS BIGINT) AS time_field) l
    GROUP BY time_field
    ;
    Output:
    — 2012-03-29 18:51:49 UTC

    SELECT
    FROM_UNIXTIME(1333047109,’yyyy-MM-dd HH:mm:ss z’) as date_z_from_string
    Output:
    — 2012-03-29 19:51:49 BST

    Why are they both returning different results despite setting the timezone to UTC?

    Also, how do I set the hive to return UTC and not local time?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *