Hive UDFs – Simple and Generic UDFs

Hive UDFs: These are regular user-defined functions that operate row-wise and output one result for one row, such as most built-in mathematics and string functions. Ex: SELECT LOWER(str) FROM table_name; SELECT CONCAT(column1,column2) AS x FROM table_name; There are 2 ways of writing the UDFs Simple – extend UDF class Generic – extend GenericUDF class In… Read More »

Hive Beeline cheatsheet

Beeline Shell Commands Command Description Example !help Print a summary of command usage !quit Exits the Beeline client. !history Display the command history !table <sql_query_file> Run SQL query from file !run /user/dummy_local_user/myquery1.sql set Prints a list of configuration variables that are overridden by the user or Hive. set -v Prints all Hadoop and Hive configuration… Read More »

PIG UDF with testNG test case – concatenate two strings

PIG UDF class package org.puneetha.pig.udf; import; import org.apache.log4j.Logger; import org.apache.pig.EvalFunc; import; /*** * * * @author Puneetha * */ public final class ConcatStrPig extends EvalFunc{ private static final Logger logger = Logger.getLogger(Thread.currentThread().getStackTrace()[0].getClassName()); @Override public String exec(final Tuple input) throws IOException { logger.debug(“Tuple=” + input.toString()); String separator = ” “; StringBuilder result = new… Read More »

Category: Pig

Hive UDF with testNG test case – concatenate two strings

Hive UDF class package org.puneetha.hive.udf; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.udf.UDFType; import; import org.apache.log4j.Logger; import org.apache.hadoop.hive.ql.exec.Description; /*** * * * @author Puneetha * */ @Description(name = “udf_concat” , value = “_FUNC_(STRING, STRING) – RETURN_TYPE(STRING)\n” + “Description: Concatenate two strings, separated by spaces” , extended = “Example:\n” + ” > SELECT udf_concat(‘hello’,’world’) FROM src;\n” +… Read More »

owncloud – Introduction

Do you have data that you wish to keep in cloud and to have similar features like dropbox and google drive, but still want to have full control on your private sensitive data. Well, thats where “Owncloud” pitches in. What Owncloud gives us: Acts as a private cloud file storage system Universal File Access Share… Read More »

PIG – Commands

PIG Syntax Highlighting in vim

Category: Pig

Inverted Index – Mapreduce program

What is Inverted Index?! In computer science, an inverted index (also referred to as postings file or inverted file) is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents. Read more here Input files… Read More »