Install and uninstall jdk 7 in Linux

Reference: Note: This particular post is purely for my notes. So, I suggest you to visit the reference site 🙂 Java jdk installation Choose the jdk version from here I have chosen jdk7 from here Accept the License and Download: # wget –no-check-certificate –header “Cookie: oraclelicense=accept-securebackup-cookie” (OR) Download the tar file manually from… Read More »

SCP (Secure Copy) commands

The SCP protocol is a network protocol, based on the BSD RCP protocol, which supports file transfers between hosts on a network. SCP uses Secure Shell (SSH) for data transfer and uses the same mechanisms for authentication, thereby ensuring the authenticity and confidentiality of the data in transit. A client can send (upload) files to… Read More »

Step "database setup" – failed test-connections – Cloudera set up

During Database set up phase, you did a “Test Connection” and you encounter ‘Unknown host:7432’ (or) wrongHostname:7432 error! Either you have changed the hostname during the process and the oldHostname remained in the file /etc/cloudera-scm-server/ or cloudera manager left it blank. In either case, go to the file /etc/cloudera-scm-server/ and check whether all the below… Read More »

Custom partitioner in mapreduce – using new hadoop api 2

This is the example of custom partitioner for classic wordcount program. Driver Class: We are partitioning keys based on the first letter, so we will have 27 partitions, 26 for each partition plus 1 other characters. Below are the additional things in Driver class. job.setNumReduceTasks(26); job.setPartitionerClass(WordcountPartitioner.class); package org.puneetha.customPartitioner; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileSystem; import… Read More »

Pattern matching for files within a Mapreduce program – given hdfs path – using new api 2

Driver Class: package org.puneetha.patternMatching; import java.util.regex.Matcher; import java.util.regex.Pattern; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.util.GenericOptionsParser; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class WordcountDriver extends Configured implements Tool { public int run(String[] args) throws Exception { Job job = Job.getInstance(getConf()); /* * … Other Driver class code …… Read More »

Rename reducer output part file – using Mapreduce code (with new hadoop api 2)

Below is the code to rename our reducer output part file name from “part-*” to “customName-*”. I am using the classic wordcount example(You can check out the basic implementation here) Driver Class: In Driver class: LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class); – for avoiding the creation of empty default partfiles MultipleOutputs.addNamedOutput(job, “text”, TextOutputFormat.class,Text.class, IntWritable.class); – for adding new name… Read More »

Wordcount Mapreduce program – using Hadoop new API 2

Below is the classic wordcount example, using new api. If you are using maven, you can use the pom.xml given here. Change it according to the hadoop distribution/version you are using. Input Text: $vim input.txt cat dog apple cat horse orange apple $hadoop fs -mkdir -p /user/dummyuser/wordcount/input $hadoop fs -put input.txt /user/dummyuser/wordcount/input/ Driver Class: package… Read More »

cloudera-scm-server-db pg_ctl: server does not shut down

Problem: # service cloudera-scm-server-db stop waiting for server to shut down……………………………………………………… failed pg_ctl: server does not shut down HINT: The “-m fast” option immediately disconnects sessions rather than waiting for session-initiated disconnection. Solution: #cd /var/lib/cloudera-scm-server-db/data # rm # service cloudera-scm-server-db status pg_ctl: no server running # service cloudera-scm-server status cloudera-scm-server dead but pid file… Read More »