Federation HDFS VS HDFS

Preface

The main purpose is to solve how HDFS upgrades the NameNode to multiple. Previously, only one NameNode plus a cooperative computing node secondaryNameNode can be deployed. Note that the secondaryNameNode is not a node in active/standby mode, but a node used to assist NameNode in certain calculations. To improve [email protected]

The following content comes from:

  • https://zhuanlan.zhihu.com/p/332615557

Limitations of a single Namenode’s HDFS architecture

Namespace (namespace) restrictions

Since the Namenode stores all metadata in memory, the number of objects (files + blocks) that a single Namenode can store is limited by the heap size of the JVM where the Namenode is located. For example, if the -Xmx of the heap size of the JVM is configured as 50G in the NameNode configuration file, about 2 billion (200 million) objects can be stored.

Performance bottleneck

Due to the HDFS architecture of a single Namenode (here must be distinguished from HA), the throughput of the entire HDFS file system is limited by the throughput of a single Namenode. When the NameNode is very busy, typing a management command may report an error. For example: hdfs fsck / This takes a long time and requires a lot of commands to access the NameNode.

Isolate the problem

If HDFS has only one Namenode, it is impossible to isolate the applications of each user. Therefore, if an insufficiently tested program running on HDFS is likely to affect (slow down) the program running on HDFS in the production environment. Then in HDFS Federation, different Namespaces can be used to isolate different user applications, so that programs in different Namespace Volumes do not affect each other.

Cluster Availability

In HDFS with only one Namenode, the downtime of this Namenode will undoubtedly cause the entire cluster to be unavailable. Of course, the cluster with HA here is not in this range.

It should be noted that HDFS Federation cannot solve the problem of single point of failure. That is to say, each NameNode has a single point of failure. You need to deploy a backup namenode for each NameNode to deal with the impact of NameNode failure on the business. ————- does not provide high availability [email protected]

Why vertical expansion of the current Namenode is not feasible?

For example, expand the Heap space of Namenode by 10 times, from the original 50G to 500GB, so that the vertical expansion brings:

  1. The first problem is the startup problem, it takes too long to start. Because the fsimage and editlog read during startup, when the memory is set to 500G, your fsimage will be very large. When 50GB Heap size is configured, it takes about half an hour or even longer for Namenode’s HDFS to start once. How long does it take for 500GB? This is not a question of linear expansion of 10 times. Even if the linear expansion is 10 times, it is not acceptable to customers.

  1. The second potential problem is that when Namenode is in Full GC, if an error occurs, the entire cluster will go down, and the time of Full GC will become very long, which is longer than most of the timeout time of the cluster.

  1. The third problem is that it is difficult to debug a large JVM Heap. Optimizing Namenode memory usage is less cost-effective.

Why introduce Federation

Federation in Chinese means federation, alliance, is the Federation of NameNode, that is, there will be multiple NameNodes.

Federation can quickly solve most single Namenode HDFS problems.

The entire core design of Federation took about 3.5 months to implement. Most of the changes are in Datanode, Config, and Tools, and the Namenode itself has very few changes, so that the original robustness of the Namenode will not be affected.

Federation has good backward compatibility, and the existing single Namenode deployment configuration can continue to work without any changes.

The Federation mechanism of HDFS

HDFS Federation uses multiple independent Namenodes/namespaces to enable the HDFS naming service to scale horizontally. The Namenodes in HDFS Federation are in a federated relationship, they are independent of each other and do not need to coordinate with each other. Namenode in HDFS Federation provides namespace and block management functions. The datanode in HDFS Federation is used by all Namenodes as a common storage block.

Each datanode will register with all Namenodes in the cluster, and will periodically send heartbeat and block information reports, and process instructions from Namenode at the same time.

The two major modules of HDFS must be understood first

Namespace

Including directories, files and blocks, it supports all namespace-related file operations such as create, delete, modify, view all files and directories.

Block Storage Service (block storage service) includes two parts

Management of blocks in the namenode:

Provide datanode cluster registration, heartbeat detection and other functions.

Process block reporting information and maintain block location information.

Supports block-related operations, such as creating, deleting, modifying, and obtaining block location information.

Manage redundant information of blocks, create replicas, delete redundant replicas, etc.

Storage: datanode provides storage, read and write, access, etc. of blocks on the local file system.

Comparison and improvement between Federation HDFS and current HDFS

Structure changes

In the single NameNode architecture, HDFS has only one namespace (Namespace), which uses all blocks. In Federation HDFS, there are multiple independent namespaces (Namespace), and each namespace uses a block pool (block pool).

In the single NameNode architecture, there is only one set of blocks in HDFS, while there are multiple sets of independent blocks in Federation HDFS. A block pool is a group of blocks belonging to the same namespace.

In a single NameNode architecture, HDFS consists of a Namenode and a set of datanodes. The Federation HDFS consists of multiple Namenodes and a set of datanodes, and each datanode will store blocks for multiple block pools.

Namenode

The namenodes are directly independent of each other, and they each manage their own areas in division of labor, and do not need to coordinate with each other. If one namenode hangs up, it will not affect other namenodes.

Block Pool

The so-called Block pool (block pool) is a group of blocks (blocks) belonging to a single namespace. Each datanode stores blocks for all block pools.

Datanode is a physical concept, and block pool is a logical concept of re-dividing blocks.

Multiple blocks belonging to multiple block pools can be stored in the same datanode. Block pools allow a namespace to create a Block ID for a new block without notifying other namespaces.

After a Namenode hangs up, it will not affect the datanodes under it to serve other Namenodes. Block pool is automatically established when datanode establishes contact with Namenode and starts a session. Each block has a unique identifier, which we call the extended block ID (Extended Block ID) = BlockID + BlockID. This extended block ID is unique among HDFS clusters, which creates conditions for future cluster merging.

The data structures in Datanode are all indexed by BlockPoolID, that is, BlockMap and storage in Datanode are all indexed by BPID. In HDFS, all updates and rollbacks occur on the basis of Namenode and BlockPool. That is, there is no relationship between different Namenode/BlockPool in the same HDFS Federation.

Datanode improvements

In datanode, each Namnode has a corresponding thread. Each datanode will register with each Namenode, and periodically send heartbeat and datanode usage reports to all Namenodes.

Datanode will also send the block report (block report) of the block pool where it is located to Namenode. Since multiple Namenodes exist at the same time, any Namenode can be dynamically added, deleted and updated at any time.

Other improvements in Federation

Provides tools for monitoring and management of Namenode initialization and decommissioning. Allows load balancing at datanode level or block pool level. The background daemon of Datanode, disk and directory scanning for Federation. Provides a Web UI that displays the usage status of Namenode’s Block pool. It also provides a UI display of the storage usage status of all clusters. All Namenodes are listed in the Web UI with details like Namenode-BlockPoolID and storage usage status, lost, live and dead blocks information. There are also links to the various Namenode Web UIs. Display of Datanode decommissioning status.

Multi-namespace management issues

Whether a unique namespace or multiple namespaces are required in a cluster, the core issue is the sharing and access of data in the namespace. Using a globally unique namespace is one way to address data sharing and access. Under multiple namespaces, you can use the Client Side Mount Table method to achieve data sharing and access, and the Client Side Mount Table is implemented through the new file system viewfs. ———- Data sharing under multiple namespaces, for example, our Java application obtains all data by connecting to a namespace [email protected]

Namespace Volume (namespace volume=Namespace + Block Pool)

A Namespace and its Block Pool together are called a Namespace Volume. Namespace Volume is an independent and complete management unit. When a Namenode/Namespace is deleted, the corresponding Block Pool is also deleted. When upgrading, each Namespace Volume will also be regarded as a unit as a whole.

ClusterID

The Cluster ID is added to HDFS Federation to distinguish each node in the cluster. This ClusterID is automatically generated or provided manually when formatting a Namenode. This ClusterID will be used when formatting other Namenodes in the same cluster.

HDFS Federation is compatible with the old version of HDFS

This compatibility allows existing Namenode configurations to continue working without any changes.

The balanced configuration has options for blockpool

Balancer has been used to change the balance of multiple NameNodes in the cluster. You can run the following command:

hdfs start balancer [-policy <policy>]