Hadoop Cloudera Cluster Set up using Cloudera Manager

By | August 22, 2014
We will set up the cluster using Cloudera Manager
Note: We need a 64-bit machine for Cloudera cluster set up.

For this example, lets say we have 3 nodes to be created as a Hadoop cluster
Note: You can add any number of hosts
host1.example.com (or) host1 -> Cloudera Manager Node
host2.example.com (or) host2
host3.example.com (or) host3

Before cluster set up, we need to configure our nodes. Follow the below steps in all nodes.
Step 1:
Sync all the nodes with a time source using NTP (Network Time Protocol)

Step 2:
#Set hostname (in each node with its corresponding hostname)


#vim /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=host1.example.com 
GATEWAY=192.168.1.1

#hostname host1.example.com

Step 3:


# vim /etc/hosts
192.168.1.2  host1.example.com  host1
192.168.1.3  host2.example.com  host2
192.168.1.4  host3.example.com  host3

Step 4:
Make one user as sudo user, to be used later for SSH Ex: dummyuser


#vim /etc/sudoers
#Add the below line:
dummyuser ALL=(ALL) NOPASSWD:ALL

Step 5:
Disable SELINUX


#setenforce 0
#vim /etc/sysconfig/selinux
SELINUX=disabled

Step 6:
Disbale IPv6


#vim /etc/sysctl.conf
#Disable IPv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

#vim /etc/sysconfig/network-scripts/ifcfg-eth0
NETWORKING_IPV6=no
IPV6INIT=no

Step 7:
Check the status with the below command


#/etc/init.d/iptables status
iptables: Firewall is not running.

If the firewall is running, then stop IPtables as below:

# /etc/init.d/iptables save
# /etc/init.d/iptables stop
# chkconfig iptables off

Step 8:
If you are doing a lot of streaming, set vm.overcommit_memory kernel parameter to “1”.


#sysctl vm.overcommit_memory=1
#echo "vm.overcommit_memory = 1" >> /etc/sysctl.conf

Step 9:
Set vm.swappiness kernel parameter to 0:


#sysctl vm.swappiness=0
#echo "vm_swappiness = 0" >> /etc/sysctl.conf

Step 10: (Only for RHEL Operating System)(optional)
Redhat 6.X transparent hugepage bug workaround:


#echo 'echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag' >>/etc/rc.local

You can check the status with below command:


#cat /sys/kernel/mm/transparent_hugepage/defrag
always madvise [never]

Step 11:
Restart the network:

#/etc/init.d/network restart

or restart all systems as below:

#reboot

Our nodes are now ready to be added to the cluster.




Lets get started with the cluster set up now.
Note: Follow the below steps only on host1 (i.e. Cloudera Manager node)
Step 1:
Download the Cloudera Manager bin file from the Below URL in host1.
http://archive-primary.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin

#wget http://archive-primary.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin

Step 2:

# chmod +x cloudera-manager-installer.bin

Run the binary file


# ./cloudera-manager-installer.bin

Just say Next for all Steps and Accept License.
It will install Cloudera Manager server in host1.

Step 3:
Open Cloudera Manager with the URL –> http://host1.example.com:7180

Step 4:
Specify hosts for your CDH cluster installation. (Separated by comma or any other delimited supported by cloudera)
host1,host2,host3

Step 5:
Choose Method: Use Parcels (Recommended) More Options
Select the version of CDH: CDH-5.1.0-1.cdh5.1.0.p0.53
Select the specific release of the Cloudera Manager Agent you want to install on your hosts.: Matched release for this Cloudera Manager Server

Step 6:
Configure Java Encryption

Step 7:
Provide SSH login credentials. (login credentials of the sudo user we created i.e. dummyuser)
provide username and password

Step 8:
Continue with the next default options cloudera provides in next pages.

We have a Hadoop cluster set up
http://host1.example.com:7180
(Note: 7180 is the port for Cloudera Manager)

Comment below if you find this post useful 🙂

8 thoughts on “Hadoop Cloudera Cluster Set up using Cloudera Manager

  1. Bharath

    Hai, thakn for the wonderful tutorial,

    i have a doubt, already we have an instance in amaon web service.

    now i want to use that instace to install cloudera manager,,

    can yu help me out regariding this subject..??

    thanks in advance..

    Reply
      1. Upendr

        Hi..
        Is it possible to configure already launched instances in AWS if we install through Cloudera Director..?
        Note:- I ‘m not having AWS console credentials with me.
        Please help out.

        Thanks in Advance.

        Upendra.D

        Reply
      2. matt

        Using installtion path A installe bin is not recommended for production its for testing only
        according to cloudera documentation

        Reply
  2. Ganapathy

    Hi,
    In Step6 the details are not clear
    #vim /etc/sysconfig/network-scripts/ifcfg-eth0 – in this file. We need to add NETWORKING_IPV6=no
    IPV6INIT=no with existing entries?. what about the static or dhcp entry details. Since this example running in VM at NAT connection when restarting the machine the ip will change.

    Please add more details about entries of ifcfg-eth0 file..

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *