How to Set Up Hadoop Cluster with HDFS High Availability

Course Name	Date
Big Data Hadoop Certification Training Course	Class Starts on 11th February,2023 11th February SAT&SUN (Weekend Batch)	View Details
Big Data Hadoop Certification Training Course	Class Starts on 8th April,2023 8th April SAT&SUN (Weekend Batch)	View Details

SacTiw says:
Dec 12, 2017 at 1:05 pm GMT
Normally a client would send a get/put file request to a particular “namenode” right? So once a failover has happened how would client get to know about it?
Assuming it is client responsibility to perform the retry on failure in that case is there a way client can first query for currently active namenode and then send a request to that one?
Reply
Barış says:
Nov 29, 2017 at 7:59 am GMT
It would be really good to show how to restart this system.
Thank you for sharing this valuable information.
Reply
- EdurekaSupport says:
  Jan 5, 2018 at 11:38 am GMT
  Thank you @Baris for appreciating our work. We will look into your suggestions as well. Cheers :)
  Reply
Hassan Asghar says:
Nov 4, 2017 at 1:41 pm GMT
my hadoop cluster is setup, and working fine:
i ran word count example:
can anybody provide me the following formulas to calculate some parameters:
Response Time:
Throughput:
Average I/o Rate:
Execution Time:
Thanks in advance
Reply
Den Kushnerik says:
Jan 9, 2017 at 9:42 am GMT
Hello. Its a very helpful instruction for me!
Do we need to format the ZKFC on Standby NameNode too?
According to this page: http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Initializing_HA_state_in_ZooKeeper we must do it one time: “…next step is to initialize required state in ZooKeeper. You can do so by running the following command from one of the NameNode hosts.”
Reply
aagnasoft says:
Dec 15, 2016 at 5:20 am GMT
Wow, It is a very helpful information. Thank you so much.
sap hana online training in usa
Reply
Sanjay says:
Nov 19, 2016 at 12:37 pm GMT
Normally when we setup a hadoop cluster (non HA), we need to configure yarn by modifying its yarn-site.xml . For HA, don’t we require any HA specific modification to yarn-site.xml ?
Reply
- Ashish Bakshi says:
  Nov 29, 2016 at 8:11 am GMT
  Thanks Sanjay for going through the blog.
  In this blog, we are modifying hdfs-site.xml because we are enabling HA feature only for NameNode. And yes you are absolutely correct, you can have HA for ResourceManager as well where you will have to modify the yarn-site.xml similarly. You can follow the Hadoop documentations to setup HA for ResouceManager which is given below:
  https://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
  Reply
Rakibul hassan Rakib says:
Sep 5, 2016 at 7:17 am GMT
I am just correcting your HA Architecture image
Reply
Rakibul hassan Rakib says:
Sep 3, 2016 at 5:10 am GMT
After killing active or standby namenode I am not getting web view of killing namenode. Is it possible to getting web view after killing namenode ?. But you have seen two namenode web view after killing one namenode. How it is possible? I am facing some problem in my namenode.
Thank you
Rakib
Reply
- Mani says:
  Sep 9, 2016 at 7:36 pm GMT
  Hey Rakib,
  If the namenode is manualy transitioned from active to standby you should be able to see the WEB UI of the namenode as it is still active. But if there is a failover in the active namenode and the it got a automatic transition to the standby namenode you can’t have the web ui because of the obvious reason that the namenode is down. Once you fix the dead namenode you can see the UI with STANDBY mentioned in the UI. Hope this helps
  Thanks,
  MK
  Reply
- EdurekaSupport says:
  Sep 15, 2016 at 6:55 am GMT
  Hey Rakibul, thanks for checking out the blog. Please follow the steps given below:
  -> Please Check your hdfs-site.xml configuration file and make sure that you have set up the automatic failover as per given in the blog.
  -> In case you are still facing the issue, change the directory for namenode, datanode, JN and zookeeper and give the permission 755 for these directories
  chmod 755 directory_path
  -> Format the Active Namenode and start the services as per given in the blog
  Hope this helps.
  Reply
anil kumar says:
Dec 10, 2015 at 5:50 am GMT
am inistaling high avalability like nn1 & nn2 and dn1 …. in that nn1 and nn2 both are standby mode only what i do now
Reply
- Mani says:
  Jun 9, 2016 at 10:39 am GMT
  Hope you got the solution by now anil. It might be the reason that you did not enable automatic failover property in hdfs-site.xml. According to what you are saying that your cluster is in manual failover mode. In this scenario you have to individually designate which name node should be active or standby.
  hdfs haadmin -transitionToActive nn1
  (nn1 – Active , nn2 – Standby)
  hdfs haadmin -transitionToStandby nn1
  (nn1 – Standby , nn2 – Standby)
  hdfs haadmin -transitionToActive nn2
  (nn1 – Standby , nn2 – Active)
  hdfs haadmin -transitionToStandby nn2
  (nn1 – Standby , nn2 – Standby)
  Check your name node service status using the command:
  hdfs haadmin -getServiceStatus
  If you by mistake make both of them active you might encounter scenario of split-brain where on both nodes edits will be in progress resulting in corrupted metadata.
  Hope this helps!
  Thanks,
  MK
  Reply
sureseh says:
Nov 8, 2015 at 9:00 am GMT
Getting below error when i follow the above configuration settings.
15/11/08 01:58:34 ERROR namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled.
and i dont find solution for this from google.
Can someone help
regards
suresh bk
Reply
- EdurekaSupport says:
  Nov 19, 2015 at 11:05 am GMT
  Hi Suresh bk
  Thank you for reaching out to us.
  You can connect with our 24/7 support team with all your queries and doubts regarding Hadoop once you enroll for the course.
  You can also get in touch with us by contacting our sales team on +91-8880862004 (India) or 1800 275 9730 (US toll free). You can mail us on [email protected].
  Reply

1 2 Next »

Virtual machine	IP address	Host name
Active NameNode	192.168.1.81	nn1.cluster.com or nn1
Standby NameNode	192.168.1.58	nn2.cluster.com or nn2
DataNode	192.168.1.82	dn1.cluster.com or dn1

Big Data

How to Set Up Hadoop Cluster with HDFS High Availability

HDFS 2.x High Availability Cluster Architecture

Introduction:

NameNode Availability:

HDFS HA Architecture:

Implementation of HA Architecture:

1. Using Quorum Journal Nodes:

Fencing of NameNode:

2. Using Shared Storage:

Automatic Failover:

Setting Up and Configuring High Availability Cluster in Hadoop:

Recommended videos for you

When not to use Hadoop

Power of Python With BigData

Hadoop Cluster With High Availability

Streaming With Apache Spark and Scala

Bulk Loading Into HBase With MapReduce

Pig Tutorial – Know Everything About Apache Pig Script

What Is Hadoop – All You Need To Know About Hadoop

Is Hadoop A Necessity For Data Science?

Big Data Tutorial – Get Started With Big Data And Hadoop

Tailored Big Data Solutions Using MapReduce Design Patterns

Apache Spark For Faster Batch Processing

What is Big Data and Why Learn Hadoop!!!

What is Apache Storm all about?

Introduction to Hadoop Administration

Logistic Regression In Data Science

Improve Customer Service With Big Data

Secure Your Hadoop Cluster With Kerberos

MapReduce Tutorial – All You Need To Know About MapReduce

Big Data Processing With Apache Spark

Python for Big Data Analytics

Recommended blogs for you

How to become an Apache Spark Developer?

Why do we need Hadoop for Data Science?

Install Puppet – Install Puppet in Four Simple Steps

Splunk Careers – Your Pathway To Hot Big Data Jobs

Brief Introduction to Oozie

Big Data Analytics Tools and Technologies with key Features

5 Reasons When to and When not to use Hadoop

Infographics: How Big is Big Data?

What is Big Data? – A Beginner’s Guide to the World of Big Data

Apache Pig UDF: Part 3 – Store Functions

How to Run Hive Scripts?

Career Advantages of Hadoop Certification

Spark MLlib – Machine Learning Library Of Apache Spark

Do You Need Java To Learn Hadoop?

PySpark CheatSheet: Spark RDD with Python

Rio Olympics 2016: Big Data powers the biggest sporting spectacle of the year!

Big Prospects for Big Data

Top Apache Spark Interview Questions You Should Prepare In 2023

Splunk vs. ELK vs. Sumo Logic: Which Works Best For You?

Implementing Hadoop & R Analytic Skills in Banking Domain

Join the discussion Cancel reply

Trending Courses in Big Data

Azure Data Engineer Associate Certification C ...

Big Data Hadoop Certification Training Course

PySpark Certification Training Course

Splunk Certification Training: Power User and ...

Apache Spark and Scala Certification Training ...

Apache Kafka Certification Training Course

Big Data Hadoop Administration Certification ...

ELK Stack Training & Certification

Apache Solr Certification Training

Comprehensive Hive Certification Training

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

How to Set Up Hadoop Cluster with HDFS High Availability