kubernetes cluster on centos7

kubernetes is a system for managing containerized applications in a clustered environment. It provides basic mechanisms for deployment, maintenance and scaling of applications on public, private or hybrid setups. It also comes with self-healing features where containers can be auto provisioned, restarted or even replicated.

Kubernetes Components:

Kubernetes works in server-client setup, where it has a master providing centralized control for a number of minions. We will be deploying a Kubernetes master with three minions, as illustrated in the diagram further below.

Kubernetes has several components:

  • etcd – A highly available key-value store for shared configuration and service discovery.
  • flannel – An etcd backed network fabric for containers.
  • kube-apiserver – Provides the API for Kubernetes orchestration.
  • kube-controller-manager – Enforces Kubernetes services.
  • kube-scheduler – Schedules containers on hosts.
  • kubelet – Processes a container manifest so the containers are launched according to how they are described.
  • kube-proxy – Provides network proxy services.
    Deployment on CentOS 7
    We will need 4 servers, running on CentOS 7.1 64 bit with minimal install. All components are available directly from the CentOS extras repository which is enabled by default. The following architecture diagram illustrates where the Kubernetes components should reside:kube7-archPrerequisites1. Disable iptables on each node to avoid conflicts with Docker iptables rules

    $ systemctl stop firewalld
    $ systemctl disable firewalld

    2. Install NTP and make sure it is enabled and running:

    $ yum -y installntp
    $ systemctl start ntpd
    $ systemctl enablentpd

    3. Add an repo on all nodes.

    vim /etc/yum.repos.d/virt7-docker-common-release.repo
    [virt7-docker-common-release]
    name=virt7-docker-common-release
    baseurl=http://cbs.centos.org/repos/virt7-docker-common-release/x86_64/os/
    gpgcheck=0

    Setting up the Kubernetes Master

    The following steps should be performed on the master.

    1. Install etcd and Kubernetes through yum:
    $ yum -y install etcd kubernetes

    2. Configure etcd to listen to all IP addresses inside /etc/etcd/etcd.conf. Ensure the following lines are uncommented, and assign the following values:

    ETCD_NAME=default
    ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
    ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"

    ETCD_ADVERTISE_CLIENT_URLS="http://localhost:2379"

    3. Configure Kubernetes API server inside /etc/kubernetes/apiserver. Ensure the following lines are uncommented, and assign the following values:

    KUBE_API_ADDRESS="--address=0.0.0.0"
    KUBE_API_PORT="--port=8080"
    KUBELET_PORT="--kubelet_port=10250"
    KUBE_ETCD_SERVERS="--etcd_servers=http://127.0.0.1:2379"
    KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.254.0.0/16"
    KUBE_ADMISSION_CONTROL="--admission_control=NamespaceLifecycle,NamespaceExists,LimitRanger,SecurityContextDeny,ResourceQuota"

    KUBE_API_ARGS=""

    4. Start and enable etcd, kube-apiserver, kube-controller-manager and kube-scheduler:

    $ for SERVICES in etcd kube-apiserver kube-controller-manager kube-scheduler; do
        systemctl restart $SERVICES
        systemctl enable $SERVICES
        systemctl status $SERVICES

    done

    5. Define flannel network configuration in etcd. This configuration will be pulled by flannel service on minions:


    $etcdctl mk /kube-centos/network/config "{ \"Network\": \"172.30.0.0/16\", \"SubnetLen\": 24, \"Backend\": { \"Type\": \"vxlan\" } }"

    6. At this point, we should notice that nodes’ status returns nothing because we haven’t started any of them yet:

    $ kubectl get nodes

    NAME             LABELS              STATUS

    Setting up Kubernetes Minions (Nodes)

    The following steps should be performed on minion1, minion2 and minion3 unless specified otherwise.

    1. Install flannel and Kubernetes using yum:

    $ yum -y install flannel kubernetes

    2. Configure etcd server for flannel service. Update the following line inside /etc/sysconfig/flanneld to connect to the respective master:

    FLANNEL_ETCD="http://192.168.50.130:2379"

    3. Configure Kubernetes default config at /etc/kubernetes/config, ensure you update the KUBE_MASTER value to connect to the Kubernetes master API server:

    KUBE_MASTER="--master=http://192.168.50.130:8080"

    4. Configure kubelet service inside /etc/kubernetes/kubelet as below:
    minion1:

    KUBELET_ADDRESS="--address=0.0.0.0"
    KUBELET_PORT="--port=10250"
    # change the hostname to this host’s IP address
    KUBELET_HOSTNAME="--hostname_override=192.168.50.131"
    KUBELET_API_SERVER="--api_servers=http://192.168.50.130:8080"
    KUBELET_ARGS=""
    minion2:
    KUBELET_ADDRESS="--address=0.0.0.0"
    KUBELET_PORT="--port=10250"
    # change the hostname to this host’s IP address
    KUBELET_HOSTNAME="--hostname_override=192.168.50.132"
    KUBELET_API_SERVER="--api_servers=http://192.168.50.130:8080"
    KUBELET_ARGS=""
    minion3:
    KUBELET_ADDRESS="--address=0.0.0.0"
    KUBELET_PORT="--port=10250"
    # change the hostname to this host’s IP address
    KUBELET_HOSTNAME="--hostname_override=192.168.50.133"
    KUBELET_API_SERVER="--api_servers=http://192.168.50.130:8080"
    KUBELET_ARGS=""

    5. Start and enable kube-proxy, kubelet, docker and flanneld services:

    $ for SERVICES in kube-proxy kubelet docker flanneld; do
        systemctl restart $SERVICES
        systemctl enable $SERVICES
        systemctl status $SERVICES

    done

    6. On each minion, you should notice that you will have two new interfaces added, docker0 and flannel0. You should get different range of IP addresses on flannel0 interface on each minion, similar to below:
    minion1:

[root@kube-minion1 ~]# ip a | grep flannel | grep inet
inet 172.30.79.0/32 scope global flannel.1
[root@kube-minion1 ~]#

minion1:

[root@kube-minion2 ~]# ip a | grep flannel | grep inet
inet 172.30.92.0/32 scope global flannel.1
[root@kube-minion2 ~]#

  1. Now login to Kubernetes master node and verify the minions’ status:
$ kubectl get nodes
NAME             LABELS                                  STATUS
192.168.50.131   kubernetes.io/hostname=192.168.50.131   Ready
192.168.50.132   kubernetes.io/hostname=192.168.50.132   Ready

192.168.50.133   kubernetes.io/hostname=192.168.50.133   Ready

Executing a program for N seconds then exiting:

If you need to execute a given command for a specified number of seconds Linux system. Then there is no command available in Linux for this purpose.

#!/bin/bash

cmd=$1
seconds=$2

echo "Executing: ${cmd} for $seconds seconds"
$cmd&

cmdpid=$!
sleep $seconds

if [ -d /proc/$cmdpid ]
then
  echo "terminating program PID:$cmdpid"
  kill -9 $cmdpid
fi

If you save this into a file called run_command_for_seconds_exits.sh you use it with:

bash run_command_for_seconds_exits.sh "hdfs dfs -put 20170207020*.gz /data/TEST_1/" 50

The above will run the command “hdfs dfs -put 20170207020*.gz /data/TEST_1/” 50″  for 50 seconds, then it will exit.

Examples:
[admin@GVSGLBNN-2 TEST]# bash /data/scripts/run_command_for_seconds_exits.sh “mv /data/HdfsDownloader/retention/20170315/20170315*.gz . ” 60

bash run_command_for_seconds_exits.sh “hdfs dfs -put 20170207020*.gz /data/TEST_1/” 50

Deploying Openstack in lab

Open source software for creating private and public clouds. OpenStack software controls large pools of compute, storage, and networking resources throughout a datacenter, managed through a dashboard or via the OpenStack API.
This Post describes installing the Liberty release on centos 7.2. Followings are the steps

1.If you are using non-English locale make sure your /etc/environment is populated:

LANG=en_US.utf-8
LC_ALL=en_US.utf-8

2. Network Settings

For having external network access to the server and instances, this is a good moment to properly configure your network settings. A static IP address to your network card, and disabling NetworkManager are good ideas.

$ sudo systemctl disable firewalld
$ sudo systemctl stop firewalld
$ sudo systemctl disable NetworkManager
$ sudo systemctl stop NetworkManager
$ sudo systemctl enable network
$ sudo systemctl start network

3.Checking centos release . it should be centos7.2

[root@openstack-lab ~]# cat /etc/redhat-release

CentOS Linux release 7.2.1511 (Core)

[root@openstack-lab ~]#

4. Selinux should be disabled.

aaeaaqaaaaaaaaezaaaajgzkmwfimjeylta0otqtndc4zs1izdi2ltc5yjdmn2jmzjg4ng

you can disable seliunx by following command.

$ setenforce 0

you can check it by following command

[root@openstack-lab ~]# getenforce 
Permissive
[root@openstack-lab ~]# 

5. Host should get resolved , either through DNS or by adding entry in /etc/hosts

In my case I have added entry in /etc/hosts


[root@openstack-lab ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.168.85  openstack-lab
[root@openstack-lab ~]# ping openstack-lab
PING openstack-lab (192.168.168.85) 56(84) bytes of data.
64 bytes from openstack-lab (192.168.168.85): icmp_seq=1 ttl=64 time=0.046 ms
64 bytes from openstack-lab (192.168.168.85): icmp_seq=2 ttl=64 time=0.061 ms
^C
--- openstack-lab ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.046/0.053/0.061/0.010 ms
[root@openstack-lab ~]# 

Now you can run following commands to install openstack

$ sudo yum install -y centos-release-openstack-liberty
$ sudo yum update -y
$ sudo yum install -y openstack-packstack
$ packstack --allinone

If everything is fine you will be able to access your openstack dashboard as shown below.
aaeaaqaaaaaaaadeaaaajgezmgiyztkwlwfmywytndi5zs1hnzczlwyzythiythhmjk0na

Docker Containers basics

What is Docker?

Docker is containerization platform.
What does that mean? Well, you can package your application in to a standardized unit for software development. Those standardized units are called containers. Containers can be shipped and run independently.

Docker container wraps a  piece of software in a complete filesystem that contains everything needed to run: code, runtime, system tools, system libraries – anything that can be installed on a server. This guarantees that software will always run the same, regardless of it’s environment.

Open Docker Containers

What do you mean by Open Docker Containers?
Well, Docker containers are based on open standards. That means, you can run docker containers on all major Linux distributions, even on Microsoft windows. You can run docker containers on any infrastructure. It includes but n0t limited to cloud providers i.e. AWS, GCE, Softlayer, Azure etc, virtual machines, On premise data centers etc.

Docker Containers vs Virtual Machines

Let’s try to understand it. You need infrastructure for both: virtual machines and Docker containers. Then of course you need Operating System on top of it. Now when it comes to virtual machines you need Hypervisor. On the other hand for Docker containers you don’t require Hypervisor but a binary called Docker Engine. Now in case of Virtual Machines you need dedicated Guest OS, Libraries and Binaries, dedicated computing resources etc. On the other hand, Docker containers run over Docker Engine. No Guest OS, dedicated libraries/binaries are required. Docker containers share Linux Kernel, RAM, Storage of Host system.

To summarize:

Virtual machines include the application, the necessary binaries and libraries, and an entire guest Operating System – all of which can amount to tens of GBs.

Container include the application and all the dependencies – but share the kernel with other containers, running as isolated process in user space on the host operating system. Docker containers are not tied to any specific Infrastructure: they run on any computer, or any infrastructure, and in any cloud.

Basic video :

kubernetes Basic Terminology

Kubernetes is an open source container management system that allows the deployment, orchestration, and scaling of container applications and micro-services across multiple hosts.

A single master host will manage the cluster and run several core Kubernetes services.

API Server – The REST API endpoint for managing most aspects of the Kubernetes cluster.
Replication Controller – Ensures the number of specified pod replicas are always running by starting or shutting down pods.

Scheduler – Finds a suitable host where new pods will be reside.
etcd – A distributed key value store where Kubernetes stores information about itself, pods, services, etc.

Flannel – A network overlay that will allow containers to communicate across multiple hosts.
The minion hosts will run the following services to manage containers and their network.

Kubelet – Host level pod management; determines the state of pod containers based on the pod manifest received from the Kubernetes master.
Proxy – Manages the container network (IP addresses and ports) based on the network service manifests received from the Kubernetes master.

Docker – An API and framework built around Linux Containers (LXC) that allows for the easy management of containers and their images.

Flannel – A network overlay that will allow containers to communicate across multiple hosts.
Note: Flannel, or another network overlay service, is required to run on the minions when there is more than one minion host. This allows the containers which are typically on their own internal subnet to communicate across multiple hosts. As the Kubernetes master is not typically running containers, the Flannel service is not required to run on the master.
Pods

It’s the basic unit of Kubernetes workloads. A pod models an application-specific “logical host” in a containerized environment. In layman terms, it models a group of applications or services that used to run on the same server in the pre-container world. Containers inside a pod share the same network namespace and can share data volumes as well.

Replication controllers(RC)
Pods are great for grouping multiple containers into logical application units, but they don’t offer replication or rescheduling in case of server failure.

This is where a replication controller or RC comes handy. A RC ensures that a number of pods of a given service is always running across the cluster.

Services
Pods and replication controllers are great for deploying and distributing applications across a cluster, but pods have ephemeral IPs that change upon rescheduling or container restart.

A Kubernetes service provides a stable endpoint (fixed virtual IP + port binding to the host servers) for a group of pods managed by a replication controller.

Kubernetes cluster
In its simplest form, a Kubernetes cluster is composed by two types of nodes:

1 Kubernetes master.
N Kubernetes nodes.

Kubernetes master

The Kubernetes master is the control unit of the entire cluster.

The main components of the master are:

Etcd: a globally available datastore that stores information about the cluster and the services and applications running on the cluster.
Kube API server: this is main management hub of the Kubernetes cluster and it exposes a RESTful interface.
Controller manager: handles the replication of applications managed by replication controllers.
Scheduler: tracks resource utilization across the cluster and assigns workloads accordingly.
Kubernetes node

The Kubernetes node are worker servers that are responsible for running pods.

The main components of a node are:

Docker: a daemon that runs application containers defined in pods.
Kubelet: a control unit for pods in a local system.
Kube-proxy: a network proxy that ensures correct routing for Kubernetes services.

Install go language and build benchmark Tool

Steps to install go and build benchmark tool.

  1. Download go using “wget https://storage.googleapis.com/golang/go1.8.3.linux-amd64.tar.gz”
  2. Extract it in /usr/local
    sudo tar -C /usr/local -xzf go1.8.3.linux-amd64.tar.gz

  3. Set following variables in profile file
    export GOROOT=/usr/local/go
    export GOPATH=$HOME/gowork
    export PATH=$PATH:$GOPATH/bin:/usr/local/go/bin
    source ~/.bashrc

  4. Create following directories
    mkdir -p $HOME/gowork
    mkdir -p $HOME/gowork/src

  5. Go to $GOPATH

  6. Run
    go get github.com/coreos/etcd/tools/benchmark

the cd src/github.com/coreos/etcd/tools/benchmark

  1. Build benchmark
    go build -o benchmark
    It will create on executable ‘benchmark’ in $GOPATH/src/github.com/coreos/etcd/tools/benchmark

You can use this tool to do benchmarking

Links for Benchmarking.

https://github.com/coreos/etcd/blob/master/Documentation/op-guide/performance.md

Deploying docker on centos 7

docker

Intro :

Docker is an application that makes it simple and easy to run application processes in a container, which are like virtual machines, only more portable, more resource-friendly, and more dependent on the host operating system.

There are two methods for installing Docker on CentOS 7. One method involves installing it on an existing installation of the operating system. The other involves spinning up a server with a tool called Docker Machine that auto-installs Docker on it. In this post, I will explain how to install and use it on an existing installation of CentOS 7.

Prerequisites:

Docker requires a 64-bit OS and version 3.10 or higher of the Linux kernel.

To check current Kernel version use uname -r

[root@openstck-controller ~]# uname -r
3.10.0-27.36.1.el7.x86_64
[root@openstck-controller ~]#

Install Docker Engine:

1.Login to your machine with sudo access or root user

  1. update your machine
    $yum -y update

  2. Add the yum repo.

[root@openstck-controller ~]# tee /etc/yum.repos.d/docker.repo <<-'EOF' 
> [dockerrepo]
> name=Docker Repository
> baseurl=https://yum.dockerproject.org/repo/main/centos/7/
> enabled=1
> gpgcheck=1
> gpgkey=https://yum.dockerproject.org/gpg
> EOF
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
[root@openstck-controller ~]# ls -ltr /etc/yum.repos.d/docker.repo 
-rw-r--r--. 1 root root 156 Oct 13 07:00 /etc/yum.repos.d/docker.repo

4.  Install the Docker package.

$ sudo yum install docker-engine

5. Enable the service.

$ sudo systemctl enable docker.service

6. Start the Docker daemon.

$ sudo systemctl start docker

7. Enable Docker service

 

[root@openstck-controller ~]# sudo systemctl enable docker.service
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@openstck-controller ~]#

  1. Start docker daemon

[root@openstck-controller ~]# sudo systemctl start docker

9. Check Status of Docker service

[root@openstck-controller ~]# sudo systemctl status docker | grep Active
Active: active (running) since Thu 2016-10-13 07:15:35 EDT; 7min ago
[root@openstck-controller ~]#

10. Verify docker is installed correctly by running a test image in a container.

 $ sudo docker run --rm hello-world

 Unable to find image 'hello-world:latest' locally
 latest: Pulling from library/hello-world
 c04b14da8d14: Pull complete
 Digest: sha256:0256e8a36e2070f7bf2d0b0763dbabdd67798512411de4cdcf9431a1feb60fd9
 Status: Downloaded newer image for hello-world:latest

 Hello from Docker!
 This message shows that your installation appears to be working correctly.

 To generate this message, Docker took the following steps:
  1. The Docker client contacted the Docker daemon.
  2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
  3. The Docker daemon created a new container from that image which runs the
     executable that produces the output you are currently reading.
  4. The Docker daemon streamed that output to the Docker client, which sent it
     to your terminal.

 To try something more ambitious, you can run an Ubuntu container with:
  $ docker run -it ubuntu bash

 Share images, automate workflows, and more with a free Docker Hub account:
  https://hub.docker.com

 For more examples and ideas, visit:
  https://docs.docker.com/engine/userguide/

 

Why docker group is required ?

The docker daemon binds to a Unix socket instead of a TCP port. By default that Unix socket is owned by the user root and other users can access it with sudo. For this reason, docker daemon always runs as the root user.

To avoid having to use sudo when you use the docker command, create a Unix group called docker and add users to it. When the docker daemon starts, it makes the ownership of the Unix socket read/writable by the docker group.

1.create docker group

sudo groupadd docker

2. create user and add it to docker group

sudo usermod -aG docker docker_user
or

useradd -G docker docker_user

3. Log out and log back in.

[root@openstck-controller ~]# su docker_user
[docker_user@openstck-controller root]$ cd
[docker_user@openstck-controller ~]$ id
uid=1000(docker_user) gid=1000(docker_user) groups=1000(docker_user),980(docker) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
[docker_user@openstck-controller ~]$

4. Verify that your user is in the docker group by running docker without sudo.

docker_user@openstck-controller ~]$ docker run hello-world

Refrences :
https://docs.docker.com/engine/installation/linux/centos/
https://docs.docker.com/engine/userguide/

Enable password authentication on cloud images

The cloud images bundled by various linux distributions have password authentication disabled by default for security reasons. The only possible way to login to an instance launched using one of these images is by specifying a security key during boot and using the key to ssh. Often you would want to enable password authentication like say to login through VNC console for debugging. This post will take you through the steps involved for enabling password authentication.

  1. First you login using the pem key
 shailendrakumar-2:~ shailendra.kumar$ ssh -i cloud-key-1.pem centos@192.168.168.238
Last login: Wed Oct 5 05:40:06 2016 from 192.168.109.242
-bash: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory
[centos@centos6fordevops ~]$

You should have created key-pair before you boot any instance using it. On an openstack setup you can create key-pairs under ‘Project-> Compute-> Access&Security-> Key Pairs’.
As soon as you create one you would be prompted to download and save it to your disk. Later you can use the ‘key-pair-name.pem’ file to ssh into your instance.
Depending on the distribution of your image you have a default user to login as. For example on an Ubuntu image the default user is ‘ubuntu’, for centos it’s centos.
You can replace the floatingip with the FloatingIP/ElasticIP/publicIP address of your instance.

  1. Sudo to root
centos@centos6fordevops ~]$ sudo su -
[root@centos6fordevops ~]# 

On all cloud images the default user will have password less sudo enabled.

  1. Create new user
[root@centos6fordevops ~]# useradd -G wheel -U -m passwordtest
[root@centos6fordevops ~]# 

4.Set the password for the new user

[root@centos6fordevops ~]# passwd passwordtest
Changing password for user passwordtest.
New password: 
BAD PASSWORD: it is based on a dictionary word
Retype new password: 
passwd: all authentication tokens updated successfully.
[root@centos6fordevops ~]# 

5.Open ‘/etc/ssh/sshd_config’ and set the following parameters

ChallengeResponseAuthentication yes
PasswordAuthentication yes

6.Additionally if you want to enable root login you can set the below param too

PermitRootLogin yes
  1. set a password for root user.
[root@centos6fordevops ~]# passwd
Changing password for user root.
New password: 
BAD PASSWORD: it is based on a dictionary word
Retype new password: 
passwd: all authentication tokens updated successfully.

  1. Restart sshd service
    [root@centos6fordevops ~]# service sshd restart
    Stopping sshd: [OK]
    Starting sshd: [OK]
    [root@centos6fordevops ~]#

 

NIC Bonding :Network HA , Channel bonding , Achieving higher data rates

NIC channel bonding allows multiple network cards to act as one, allowing increased bandwidth and redundancy. Bonding is nothing but Linux kernel feature that allows to aggregate multiple like interfaces (such as eth0, eth1) into a single virtual link such as bond0. The idea is pretty simple, get higher data rates and as well as link failover. NIC(Network Interface Card) bonding is also known as Network bonding

Bonding Driver:

Linux allows binding of multiple network interfaces into a single channel/NIC using special kernel module called bonding.

The Linux bonding driver provides a method for aggregating multiple network interfaces into a single logical “bonded” interface. The behavior of the bonded interfaces depends upon the mode; generally speaking, modes provide either hot standby or load balancing services. Additionally, link integrity monitoring may be performed.

In this article we will learn how to configure nic or netwok bonding in CentOS 7 & RHEL 7. In my case i have two interface cards (en0s3 & en0s8) and will form a bond interface (bond0).

Prerequisite :

If bonding module is not loaded on your linux box then use the below command to load.

[root@controller-node ~]# modprobe bonding

To list the bonding module info, use following command.

[root@openstack ~]# modinfo bonding

Output will be something like below

Screen Shot 2016-09-02 at 5.02.05 PM

Step:1 Create Bond Interface File

Create a bond interface file (ifcfg-bond0) under the folder “/etc/sysconfig/network-scripts/

[root@controller-node network-scripts]# vi ifcfg-bond0
DEVICE=bond0
TYPE=Bond
NAME=bond0
BONDING_MASTER=yes
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.1.70
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
BONDING_OPTS="mode=5 miimon=100"

Save & exit the file.

Specify the IP address, Netmask & bonding modes as per your requirement. In my example i am using ‘mode=5′ which is used to provide fault tolerance and load balancing.

Step:2 Edit the NIC interface files

For ifcfg-enp0s3

[root@controller-node ~]# vi /etc/sysconfig/network-scripts/ifcfg-enp0s3
TYPE=Ethernet
BOOTPROTO=none
DEVICE=enp0s3
ONBOOT=yes
HWADDR="08:00:27:69:60:c9"
MASTER=bond0
SLAVE=yes

For ifcfg-enp0s8

[root@controller-node ~]# cat /etc/sysconfig/network-scripts/ifcfg-enp0s8
TYPE=Ethernet
BOOTPROTO=none
DEVICE=enp0s8
ONBOOT=yes
HWADDR="08:00:27:ea:71:8d"
MASTER=bond0
SLAVE=yes
Step:3 Restart the Network Service

Below command will restart the network service and will bring above changes into the effect.

[root@controller-node ~]# systemctl restart network.service

Step:4 Test & Verify bond interface.

Use ‘ifconfig‘ & ‘ip add‘ command to check bond interface along with its slave interfaces.

ifconfig-bond

Use following command to view bond interface settings like bonding mode & slave interface.

[root@controller-node ~]# cat /proc/net/bonding/bond0

bonding-settings

Step:5 Fault tolerance testing

To test the fault tolerance we can down one interface and check whether you are still able access the server.

[root@controller-node ~]# ifdown enp0s8
Device 'enp0s8' successfully disconnected.
[root@controller-node ~]#

fault-tolerance-bonding

Different Modes used in bonding.conf file .

  • balance-rr or 0 — round-robin mode for fault tolerance and load balancing.
  • active-backup or 1 — Sets active-backup mode for fault tolerance.
  • balance-xor or 2 — Sets an XOR (exclusive-or) mode for fault tolerance and load balancing.
  • broadcast or 3 — Sets a broadcast mode for fault tolerance. All transmissions are sent on all slave interfaces.
  • 802.3ad or 4 — Sets an IEEE 802.3ad dynamic link aggregation mode. Creates aggregation groups that share the same speed & duplex settings.
  • balance-tlb or 5 — Sets a Transmit Load Balancing (TLB) mode for fault tolerance & load balancing.
  • balance-alb or 6 — Sets an Active Load Balancing (ALB) mode for fault tolerance & load balancing.

Hadoop Core Components : HDFS and MapReduce

Let’s talk about components of Hadoop. Hadoop as a whole distribution provides only two core components and HDFS (Hadoop Distributed File System – Storage component ) and MapReduce (which is a distributed batch processing framework – processing component ), and a bunch of machines which are running HDFS and MapReduce are known as Hadoop Cluster.

You can add more nodes in Hadoop Cluster the performance of your cluster will increase which means that Hadoop is Horizontally Scalable.

HDFS – Hadoop Distributed File System (Storage Component)

HDFS is a distributed file system which stores the data in distributed manner. Rather than storing a complete file it divides a file into small blocks (of 64 or 128 MB size) and distributes them across the cluster. Each blocks is replicated (3 times as per default configuration – replication factor ) multiple times and is stored on different nodes to ensure data availability in case of node failure. Normally HDFS can be installed on native file systems like xfs, ext3 or ext4 .

You can write file and read file from HDFS. You cannot updated any file on HDFS. Recently Hadoop has added the support of appending content to the file which was not there in previous releases.

Here are some examples of HDFS commands.

Get list of all HDFS directories under /data/

$ hdfs dfs -ls /data/

Create a directory on HDFS under /data directory

$ hdfs dfs -mkdir /data/hadoopdevopsconsulting

Copy file from current local directory to HDFS directory /data/hadoopdevopsconsulting

$ hdfs dfs -copyFromLocal ./readme.txt /data/hadoopdevopsconsulting

View content of file from HDFS directory /data/hadoopdevopsconsulting

$ hdfs dfs -cat /data/hadoopdevopsconsulting/readme.txt

Delete a file or directory from HDFS directory /data/hadoopdevopsconsulting

$ hdfs dfs -rm /data/hadoopdevopsconsulting/readme.txt

Examples:

[admin@hadoopdevopsconsulting ~]# hdfs dfs -ls /data/
Found 1 items
drwxr-xr-x - admin supergroup 0 2016-08-29 11:46 /data/ABC
[admin@hadoopdevopsconsulting ~]# hdfs dfs -mkdir /data/hadoopdevopsconsulting
[admin@hadoopdevopsconsulting ~]# hdfs dfs -ls /data/
Found 2 items
drwxr-xr-x - admin supergroup 0 2016-08-29 11:46 /data/ABC
drwxr-xr-x - admin supergroup 0 2016-08-29 11:54 /data/hadoopdevopsconsulting
[admin@hadoopdevopsconsulting ~]# hdfs dfs -copyFromLocal readme.txt /data/hadoopdevopsconsulting/
[admin@hadoopdevopsconsulting ~]# hdfs dfs -ls /data/hadoopdevopsconsulting/
Found 1 items
-rw-r--r-- 2 admin supergroup 57 2016-08-29 11:54 /data/hadoopdevopsconsulting/readme.txt
[admin@hadoopdevopsconsulting ~]# hdfs dfs -cat /data/hadoopdevopsconsulting/readme.txt
Hi This is hadoop command demo by hadoopdevopsconsulting
[admin@hadoopdevopsconsulting ~]# hdfs dfs -rm /data/hadoopdevopsconsulting/readme.txt
16/08/29 11:55:25 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /data/hadoopdevopsconsulting/readme.txt
[admin@hadoopdevopsconsulting ~]# hdfs dfs -ls /data/hadoopdevopsconsulting/
[admin@hadoopdevopsconsulting ~]#

MR
Data block distribution in hdfs

MapReduce:

MapReduce is the algorithm of executing any task on distributed system. Using MapReduce one can process a large file in parallel. MapReduce framework executes any task on different nodes (slaves ) as full file is distributed across the cluster in a form of various blocks.

It has two phases, Map(Mapper Task) and Reduce (Reducer Task)

  • Each of these tasks would run on individual blocks of the data
  • First mapper task would take each line of elements as an input and generates intermediate key value pairs
  • Each mapper task is executed on a single block of data
  • Than reducer task will take list of key value pairs for same keys, process the data and generates the final output
  • A phase called shuffle and sort will take place between mapper and reducer task will send the data to proper reducer tasks
  • Shuffle process maps the mapper output with the same key to the collection of values as a value
    • For example (key1, val1) and (key1, val2) will be converted to (key1, [val1, val2])
  • The mapper and reducer tasks would in parallel
  • The reducer tasks can start their work as soon as mapper tasks are completed

MapReduce

Hadoop Core Concepts

Lets talk about Hadoop Core Concepts.

  • Distributed system design
  • How data is distributed across multiple systems ?
  • Different components involved and how they communicate with each others ?

Let’s deep dive and learn about core concepts of Hadoop and it’s architecture.

  • Hadoop Architecture distributes data across the cluster nodes by splitting it into small blocks (64 MB or 128 MB depending upon the configurations). Every node tries to work on individual block stored locally(data Locality ). Hence no data transfer is required during the processing most of the time.
  • Every time when Hadoop process the data, each node connects to other node as much less as possible. So most of the time node only deals with the data which is stored locally. This concept is known as “Data Locality”.
  • As we know data is distributed across the nodes, so in order to increase data availability each block of the data is replicated (as per the configuration/replication factor ) over different nodes. This method would help Hadoop to handle partial failures in the cluster.
  • Whenever MapReduce job (consist to typically two tasks Map Task and Reduce Task) is executed, Map tasks are executed on individual data blocks on each node (in most of the cases) and leverage “Data Locality”. This is how multiple nodes process data in parallel manner. It makes processing faster (bacause of parallel processing ) than traditional distributed systems.
  • If any node fails in between, the master will detect this failure and assign the same task to another node where the replica of the same data block is available.
  • If any failed node restarts, it automatically joins back to the cluster and than after master can assign it a new task whenever is required.
  • In the execution process of any job, if master detects any task running slowly on any node compare to other nodes, it will allocate the same redundant task to other node to make overall execution faster. This process is known as “Speculative Execution
  • The Hadoop jobs are written in a high level language like Java and Python, and the developer do not have any control over network programming or component level programming. You just have to focus on core logic other things would be taken care by Hadoop framework itself.