Timezone problem – Any function which triggers mapreduce job, causes this problem, since it takes the local timezone of machine where it runs the mapper/reducer
In our case, lets say our servers are in German timezone i.e. CET
-- With original settings SET system:user.country; +-------------------------+--+ | set | +-------------------------+--+ | system:user.country=GB | +-------------------------+--+ -- Original settings SET mapreduce.map.java.opts=-Duser.timezone=CET; SET mapreduce.reduce.java.opts=-Duser.timezone=CET; SELECT FROM_UNIXTIME(time_field, 'yyyy-MM-dd HH:mm:ss z') as date_z_from_string ,COLLECT_SET(FROM_UNIXTIME(time_field, 'yyyy-MM-dd HH:mm:ss z')) as date_z_collect_set FROM (SELECT CAST('1333047109' AS BIGINT) AS time_field) t GROUP BY time_field ; -- This is not the output we want: +---------------------------+-------------------------------+--+ | date_z_from_string | date_z_collect_set | +---------------------------+-------------------------------+--+ | 2012-03-29 20:51:49 CEST | ["2012-03-29 20:51:49 CEST"] | +---------------------------+-------------------------------+--+
Fixing the configuration to convert it to UTC always
-- Set this hive configuration, then it automatically converts it to UTC timezone, there wont be any code change needed. SET mapreduce.map.java.opts=-Duser.timezone=UTC; SET mapreduce.reduce.java.opts=-Duser.timezone=UTC; SELECT FROM_UNIXTIME(time_field,'yyyy-MM-dd HH:mm:ss z') as date_z_from_string ,COLLECT_SET(FROM_UNIXTIME(time_field,'yyyy-MM-dd HH:mm:ss z')) as date_z_from_array FROM (SELECT CAST('1333047109' AS BIGINT) AS time_field) l GROUP BY time_field ; -- Expected output: +--------------------------+------------------------------+--+ | date_z_from_string | date_z_collect_set | +--------------------------+------------------------------+--+ | 2012-03-29 18:51:49 UTC | ["2012-03-29 18:51:49 UTC"] | +--------------------------+------------------------------+--+
SET mapreduce.map.java.opts=-Duser.timezone=UTC;
SET mapreduce.reduce.java.opts=-Duser.timezone=UTC;
SELECT
FROM_UNIXTIME(time_field,’yyyy-MM-dd HH:mm:ss z’) as date_z_from_string
,COLLECT_SET(FROM_UNIXTIME(time_field,’yyyy-MM-dd HH:mm:ss z’)) as date_z_from_array
FROM (SELECT CAST(‘1333047109′ AS BIGINT) AS time_field) l
GROUP BY time_field
;
Output:
— 2012-03-29 18:51:49 UTC
SELECT
FROM_UNIXTIME(1333047109,’yyyy-MM-dd HH:mm:ss z’) as date_z_from_string
Output:
— 2012-03-29 19:51:49 BST
Why are they both returning different results despite setting the timezone to UTC?
Also, how do I set the hive to return UTC and not local time?