Note: We need a 64-bit machine for Cloudera cluster set up.
For this example, lets say we have 3 nodes to be created as a Hadoop cluster
Note: You can add any number of hosts
host1.example.com (or) host1 -> Cloudera Manager Node
host2.example.com (or) host2
host3.example.com (or) host3
Before cluster set up, we need to configure our nodes. Follow the below steps in all nodes.
Step 1:
Sync all the nodes with a time source using NTP (Network Time Protocol)
Step 2:
#Set hostname (in each node with its corresponding hostname)
#vim /etc/sysconfig/network NETWORKING=yes HOSTNAME=host1.example.com GATEWAY=192.168.1.1 #hostname host1.example.com
Step 3:
# vim /etc/hosts 192.168.1.2 host1.example.com host1 192.168.1.3 host2.example.com host2 192.168.1.4 host3.example.com host3
Step 4:
Make one user as sudo user, to be used later for SSH Ex: dummyuser
#vim /etc/sudoers #Add the below line: dummyuser ALL=(ALL) NOPASSWD:ALL
Step 5:
Disable SELINUX
#setenforce 0 #vim /etc/sysconfig/selinux SELINUX=disabled
Step 6:
Disbale IPv6
#vim /etc/sysctl.conf #Disable IPv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1
#vim /etc/sysconfig/network-scripts/ifcfg-eth0 NETWORKING_IPV6=no IPV6INIT=no
Step 7:
Check the status with the below command
#/etc/init.d/iptables status iptables: Firewall is not running.
If the firewall is running, then stop IPtables as below:
# /etc/init.d/iptables save # /etc/init.d/iptables stop # chkconfig iptables off
Step 8:
If you are doing a lot of streaming, set vm.overcommit_memory kernel parameter to “1”.
#sysctl vm.overcommit_memory=1 #echo "vm.overcommit_memory = 1" >> /etc/sysctl.conf
Step 9:
Set vm.swappiness kernel parameter to 0:
#sysctl vm.swappiness=0 #echo "vm_swappiness = 0" >> /etc/sysctl.conf
Step 10: (Only for RHEL Operating System)(optional)
Redhat 6.X transparent hugepage bug workaround:
#echo 'echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag' >>/etc/rc.local
You can check the status with below command:
#cat /sys/kernel/mm/transparent_hugepage/defrag always madvise [never]
Step 11:
Restart the network:
#/etc/init.d/network restart
or restart all systems as below:
#reboot
Our nodes are now ready to be added to the cluster.
Lets get started with the cluster set up now.
Note: Follow the below steps only on host1 (i.e. Cloudera Manager node)
Step 1:
Download the Cloudera Manager bin file from the Below URL in host1.
http://archive-primary.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
#wget http://archive-primary.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
Step 2:
# chmod +x cloudera-manager-installer.bin
Run the binary file
# ./cloudera-manager-installer.bin
Just say Next for all Steps and Accept License.
It will install Cloudera Manager server in host1.
Step 3:
Open Cloudera Manager with the URL –> http://host1.example.com:7180
Step 4:
Specify hosts for your CDH cluster installation. (Separated by comma or any other delimited supported by cloudera)
host1,host2,host3
Step 5:
Choose Method: Use Parcels (Recommended) More Options
Select the version of CDH: CDH-5.1.0-1.cdh5.1.0.p0.53
Select the specific release of the Cloudera Manager Agent you want to install on your hosts.: Matched release for this Cloudera Manager Server
Step 6:
Configure Java Encryption
Step 7:
Provide SSH login credentials. (login credentials of the sudo user we created i.e. dummyuser)
provide username and password
Step 8:
Continue with the next default options cloudera provides in next pages.
We have a Hadoop cluster set up
http://host1.example.com:7180
(Note: 7180 is the port for Cloudera Manager)
Comment below if you find this post useful 🙂
Hai, thakn for the wonderful tutorial,
i have a doubt, already we have an instance in amaon web service.
now i want to use that instace to install cloudera manager,,
can yu help me out regariding this subject..??
thanks in advance..
@Bharath
The same installation steps goes for amazon instance as well.
or you could use Cloudera Director, since you are aiming for cloud.
Hi..
Is it possible to configure already launched instances in AWS if we install through Cloudera Director..?
Note:- I ‘m not having AWS console credentials with me.
Please help out.
Thanks in Advance.
Upendra.D
Using installtion path A installe bin is not recommended for production its for testing only
according to cloudera documentation
Hello thank you for the reply,
i have an instance of ubuntu 12.04 , 64 bit ..
and am going to create a hadoop cluster on amazon ec2 using cloudera manager bin,
and i am struggling to create a cluster. so please can you help me to solve this .???
am following this tutorial (http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-7-1/Cloudera-Manager-Installation-Guide/cmig_install_on_EC2.html)
thanks in advance,
Hi,
In Step6 the details are not clear
#vim /etc/sysconfig/network-scripts/ifcfg-eth0 – in this file. We need to add NETWORKING_IPV6=no
IPV6INIT=no with existing entries?. what about the static or dhcp entry details. Since this example running in VM at NAT connection when restarting the machine the ip will change.
Please add more details about entries of ifcfg-eth0 file..
Excellent blog with detailed steps
Excellent and very useful document!