HBase: NoSQL in Big Data

HBase Overview

Hbase is a highly reliable, high-performance, column-oriented, scalable distributed storage system used to store massive amounts of structured, semi-structured, and unstructured data. The underlying data is stored in the form of binary streams. in data blocks on HDFS

HBase application scenarios

  • Write-intensive applications, applications with a huge amount of writes every day but a relatively small number of reads, such as WeChat historical messages, game logs, etc.
  • There is no need for complex query conditions and there is a need for fast random access. HBase only supports rowkey-based queries. For HBase, single record or small-scale queries are acceptable. Large-scale queries may have some impact on performance due to distributed reasons. For queries such as SQL joins, etc. , HBase cannot support it.
  • For applications that require very high performance and reliability, since HBase itself has no single point of failure, the availability is very high. For applications with large data volumes and unpredictable growth, HBase supports online expansion. Even if the data volume grows exponentially over a period of time, HBase can be horizontally expanded to meet the functionality.
  • Structured and semi-structured data, based on Hbase dynamic columns and sparse storage characteristics. HBase supports dynamic expansion of columns under the same column cluster, without having to define all data columns in advance, and adopts sparse storage, which does not occupy storage space when the column data is empty.

Comparison between Hbase and MySQL

td>

HBase MySQL Remarks
Type NoSQL RDS
Data Structured or semi-structured, unstructured data Structured data
Distributed Data is stored in HDFS, supporting distributed storage MySQL is a stand-alone machine and does not have built-in distributed storage. The distributed storage effect can be achieved through MySQL replication or data sharding technology
Data consistency Each piece of data in HBase will only appear in one Region, and its data redundancy backup is not in At the region level, we still rely on HDFS for redundancy. HBase supports row-level transactions, that is, a put operation either succeeds or fails. When a RegionServer is down, the Region will be assigned to other RegionServers, and WAL Log will be written at the same time There may be a delay in data synchronization (master-slave delay), resulting in data consistency issues
Persistence technology LSM tree redo log
Query language NoSQL: Querying data is inflexible: filtering queries between columns cannot be used SQL

HBase data model

In HBase, a cell needs to be determined based on the row key, column family, column qualifier and timestamp. Therefore, it can be regarded as a “four-dimensional coordinate”, that is, [row key, column family, column qualifier, timestamp]

  • Row Key: The primary key of the table. Records in the table are sorted in ascending order by row key by default.
  • Timestamp: The timestamp corresponding to each data operation, which can be regarded as the version number of the data
  • Column Family: The table is composed of one or more column families in the horizontal direction. A column family can be composed of any number of columns. The column family supports dynamic expansion. There is no need to predefine the number and type of columns. All columns are Stored in binary format, users need to perform type conversion by themselves. The prefixes of all column family members are the same. For example, both columns “c:name” and “c:age” belong to the column family c. The data of each column family is stored together to form a storage unit Store.
  • Table & Region: When the table becomes larger as the number of records continues to increase, it will gradually split into multiple parts and become regions. A region is a horizontal division of the table, and different regions will be allocated by the Master. Manage the corresponding RegionServer
  • Cell: The cell where the table stores data. It is uniquely determined by {row key, column (column family: label), timestamp}. The data therein has no type and is stored in binary form.
    The data concept of HBase constitutes a logical view as shown in the figure below

Row storage, column storage, column cluster storage

Row storage: row-based storage, which continuously stores the entire row of data together. In a table based on row storage, even when only a specified column needs to be read, the data of the corresponding row needs to be read into memory first, and then the target column is filtered. This will cause excessive disk IO, memory and time overhead. Therefore, row storage is more suitable for scenarios where a complete row needs to be accessed each time.

Line storage
Column storage: Column-based storage stores column data continuously together. Because the same type of data is stored together, the compression ratio is often relatively high, which also reduces disk IO, memory and time overhead. Therefore, columnar storage is suitable for scenarios that only operate on a single column or a few columns. Especially in the era of big data, when there are many columns and rows of data, the advantages of columnar storage will be more obvious. But on the other hand, columnar storage is not that efficient for obtaining the entire row request. It requires multiple IOs to read the data of multiple columns and then merge and return it.
Column storage

Column cluster storage: Conceptually, column cluster storage is between row storage and column storage. You can switch between row storage and column storage through different design ideas. For example, a table only has one column family, and this column family contains the columns of all users. The data of a column family in HBase is stored together, so this design pattern is equivalent to row storage. For another example, a table has a large number of column families, and each column family has only one column. Obviously, this design pattern is equivalent to columnar storage.
Column clustered storage

HBase architecture design

HBase system architecture


HBase is composed of three types of servers in master-slave mode:

  • HRegion Server: Responsible for data reading and writing services. Users can access data by interacting with the HRegion server. HRegion server interacts with HDFS through HDFS Clinet.
  • HBase HMaster: Responsible for Region allocation and database creation and deletion operations, load balancing of data during idle time, monitoring RegionServer, and handling RegionServer failover
  • ZooKeeper: Responsible for maintaining the status of the cluster (whether a server is online, synchronization of data between servers, election of master, etc.).

The DataNode of HDFS is responsible for storing all data managed by Region Server, that is, all data in HBase is stored in the form of HDFS files.

HBase storage architecture

HBase data is stored on HRegion Server, which has concepts such as HRegion and Hlog.

HBase can store massive amounts of data, and the data is stored in HDFS, which is distributed. Therefore, the data in one HBase table will be distributed to multiple machines. So how does HBase cut the data in a table? RowKey is used for segmentation, which is actually a horizontal cutting of the table.

  • HRegion: A shard of the data table. When the size of the data table exceeds a certain threshold, it will be “horizontally split” and split into two HRegions. HRegion is the basic unit of cluster load balancing. Usually the HRegions of a table will be distributed on multiple HRegionServers throughout the cluster. One HRegionServer will manage multiple HRegions. Of course, these HRegions generally come from different data tables.
  • Store:
    • Mem Store: When HBase writes data, it will first write to the Mem Store, which can be understood as a memory buffer.
    • StoreFile: When MemStore exceeds a certain threshold, the data in the memory will be written to the hard disk to form a StoreFile.
    • HFile: The storage format of KeyValue data in HBase. HFile is a binary format file of Hadoop. In fact, StoreFile is a lightweight packaging of HFile, that is, the bottom layer of StoreFile is HFile.
  • HLogFile, the storage format of WAL (Write Ahead Log) in HBase, is physically the Sequence File of Hadoop. The Key of the Sequence File is the HLogKey object. The HLogKey records the ownership information of the written data. In addition to the table and region names, it also Including sequence number and timestamp, timestamp is the “writing time”, the starting value of the sequence number is 0, or the latest sequence number stored in the file system.

Storage data structure

A column family (Column Family) of HBase is essentially an LSM tree (Log-Structured Merge-Tree). The LSM tree is divided into a memory part and a disk part.

The memory part is a data structure that maintains an ordered collection of data. Generally speaking, memory data structures can choose data structures that maintain ordered sets such as balanced binary trees, red-black trees, and skip lists. Here, due to consideration of concurrency performance, HBase chose skip lists with better performance.

The disk part is composed of independent files, and each file is composed of data blocks. For database systems where data is stored on disk, disk seeking and data reading are very time-consuming operations (referred to as IO time-consuming). Therefore, in order to avoid unnecessary IO time-consuming, some additional binary data can be stored on the disk. This data is used to determine whether a given key is likely to be stored in this data block. This data structure is called Bloom filtering. Bloom Filter.

Jump table

A skip list is a kind of randomized data. The skip list stores elements in a hierarchical linked list in an orderly manner. The efficiency is comparable to that of a balanced tree – operations such as search, deletion, and addition can all be performed in logarithmic expected time. It is completed below, and compared to the balanced tree, the implementation of the jump table is much simpler and more intuitive.


The jump table mainly consists of the following parts:

  • Head: The node pointer responsible for maintaining the jump table.
  • Jump table node: holds element values and multiple layers.
  • Layer: holds pointers to other elements. The number of elements crossed by the high-level pointer is greater than or equal to the low-level pointer. In order to improve the search efficiency, the program always starts accessing from the high level first, and then slowly lowers the level as the element value range shrinks.
  • Table tail: all composed of NULL, indicating the end of the jump table.

LSM tree

LSM tree (Log-Structured-Merge-Tree) is similar to B+ tree. They are designed to better store data on large-capacity disks. Compared with B+ trees, LSM trees have better random write performance.

The structure of LSM tree

The structure of the LSM tree spans memory and disk, including memtable, immutable memtable, SSTable and other parts.

memtable

As the name suggests, memtable is a data structure in memory, used to save some recent update operations. When writing data to memtable, it will first be backed up to disk through WAL to prevent the data from being lost due to memory power failure.

Write-ahead logging (WAL) is a set of technologies used in relational database systems to provide atomicity and durability (two of the ACID properties). In systems using WAL, all modifications must be written to the log file before being submitted.

Memtable can use data structures such as skip tables or search trees to organize data to keep the data orderly. When the memtable reaches a certain amount of data, the memtable will be converted into an immutable memtable, and a new memtable will be created to process the new data.

immutable memtable

As the name suggests, immutable memtable is an unmodifiable data structure in memory. It is an intermediate state that converts memtable into SSTable. The purpose is not to block write operations during the transfer process. Write operations can be handled by the new memtable without having to wait while the memtable is locked.

SSTable

SSTable (Sorted String Table) is a collection of ordered key-value pairs, which is the structure of the data in the disk of the LSM tree group. If the SSTable is relatively large, you can also create an index based on the key value to speed up SSTable queries. The following figure is a simple SSTable structure diagram:


The data in the memtable will eventually be converted into SSTable and saved on the disk. There will be corresponding SSTable log merging operations later, which is also the focus of the LSM tree structure.

The structure of the final LSM tree can be simply represented by the following figure:

Bloom filter

Bloom Filter is a fast search algorithm for multi-hash function mapping. It is a space-efficient probabilistic data structure that is usually used in situations where it is necessary to quickly determine whether an element belongs to a set, but does not strictly require 100% accuracy.

The advantage of Bloom filter is that it can achieve higher accuracy by using less space, and its space efficiency and query time are much better than ordinary algorithms. The disadvantage is that it has a certain misrecognition rate and difficulty in deletion. Why is deletion of elements not allowed? Deletion means that the corresponding k bit positions need to be set to 0, which may be the corresponding bits of other elements.

It can be simply understood as the picture below. Elements 1, 6, and 9 are stored in the array after the hash function. You can see the dynamic video on the Bloom Filters visualization website. Then to determine whether the memory exists, you only need to determine whether the array positions corresponding to the elements after the hash are equal to 1.

HBase reading and writing process

Reading process

  • When the Client requests to read data, it is first forwarded to the ZK cluster.
  • Find the corresponding HRegion Server in the ZK cluster, and then find the corresponding HRegion
  • HRegion Server first checks the MemStore of the Region. If the data is obtained in the MemStore, it will return directly.
  • Otherwise, HRegion will find the corresponding Store File to find the specific data.

In the entire architecture, HMaster and HRegion Server can be on the same node, and multiple HMasters can exist, but only one HMaster is active.

The rowkey->HRegion mapping relationship will be cached on the client side to reduce the pressure of next addressing.

Writing process

  • The Client initiates a data insertion request. If the Client itself stores the mapping relationship between Rowkey and Region, it will first find the specific corresponding relationship.
  • If not, the corresponding Region server will be found in ZK and then forwarded to the specific Region.
    HRegionServer first writes data to HLog to prevent data loss.
  • Then find the corresponding HRegion, write the data to MemStore, and sort the Row Key in MemStore. If both HLog and MemStore are written successfully, this data is written successfully and feedback is given to the Client that the writing is successful.
  • After the flashing time of MemStore is reached, the data is written to StoreFile to form an HFile file.

HBase usage

RowKey Design

The row key in HBase is used to retrieve records in the table and supports the following three methods:

  • Access through a single row key: that is, perform a get operation based on a certain row key value;
  • Scan through the range of row key: that is, by setting startRowKey and endRowKey, scan within this range;
  • Full table scan: that is, directly scan all row records in the entire table.

The row key is stored in lexicographic order. Therefore, when designing the row key, you must make full use of this sorting feature to store data that are often read together together and put data that may be accessed recently together.

RowKey Rules

  • Principle of uniqueness of rowkey: its uniqueness must be ensured in design. Rowkey is stored in dictionary order. Therefore, when designing rowkey, you must make full use of this sorting feature to store frequently read data in one block, so that the most recent data can be stored together. The data that will be accessed is put together.
  • Rowkey length principle: Rowkey is a binary code stream. Many developers suggest that the length of Rowkey should be designed between 10 and 100 bytes. However, it is recommended that the shorter the better, and should not exceed 16 bytes.
    • The data persistence file HFile is stored according to KeyValue. If the Rowkey is too long, such as 100 bytes, the Rowkey alone for 10 million columns of data will occupy 100*10 million = 1 billion bytes, which is nearly 1G of data. This will be extremely harmful. Greatly affects the storage efficiency of HFile;
      MemStore will cache some data into memory. If the Rowkey field is too long, the effective utilization of memory will be reduced, and the system will not be able to cache more data, which will reduce retrieval efficiency. Therefore, the shorter the byte length of Rowkey, the better.
    • Current operating systems are all 64-bit systems, with 8-byte memory alignment. Controlled to 16 bytes, integer multiples of 8 bytes take advantage of the best features of the operating system.
  • Rowkey hashing principle: Hash the primary key as the head of the rowkey. This is also an important way to avoid hot data and data skew.

Design of column families

Don’t define too many column families in one table. Currently, Hbase cannot handle tables with more than 2 to 3 column families very well. Because when a column family is flushed, its adjacent column families will also be flushed due to correlation effects, ultimately causing the system to generate more I/O.

Hotspot data

HBase tables will be divided into one or more Regions and hosted in RegionServer.

As we can see from the figure below, Region has two important attributes: StartKey and EndKey. Indicates the range of rowkey maintained by this Region. When we want to read and write data, if the rowkey falls within a certain start-end key range, then the target region will be located and the relevant data will be read and written.

By default, when a table is created, there is only one region, and the start-end key has no boundaries. All data is stored in this region. However, when there is more and more data and the size of the region becomes larger and larger, When it reaches a certain threshold, hbase believes that it is no longer appropriate to add data to this region, and it will find a midKey to split the region into two regions. This process is called region-split. The midKey is the critical value of these two regions (how this intermediate value is selected is not discussed here).

At this time, we assume that if rowkey is less than midKey, it will be stuffed into area 1, and if it is greater than or equal to midKey, it will be stuffed into area 2. If rowkey still increases sequentially, then the data will always be written to area 2, and Zone 1 is now in a neglected state and is half full. When the data in Region 2 is full, it will be split into two regions again, thus continuously generating regions that are left out and dissatisfied. Of course, these regions have the function of providing data query.

What are hot spots and data skew

  • Hotspot: occurs when a large number of clients directly access one or a very small number of nodes in the cluster (the access may be read, write or other operations).
    • A large number of accesses will cause the single machine where the hotspot region is located to exceed its own capacity, causing performance degradation and even the region becoming unavailable. This will also affect other regions on the same RegionServer, because the host cannot serve requests from other regions, resulting in a waste of resources. Design a good data access pattern so that the cluster is fully and balancedly utilized.
  • Data skew: Hbase data is also stored on a few nodes. This is data skew.

Go reads and writes HBase

Client: https://github.com/tsuna/gohbase

package main
 
import (
    "context"
    "io"
    "os"
 
    "github.com/sirupsen/logrus"
    "github.com/tsuna/gohbase"
    "github.com/tsuna/gohbase/hrpc"
)
 
func init() {<!-- -->
 
    // Use Stdout as output instead of the default stderr
    logrus.SetOutput(os.Stdout)
    //Set log level
    logrus.SetLevel(logrus.DebugLevel)
 
}
 
type HbaseClient struct {<!-- -->
    client gohbase.Client
}
 
func main() {<!-- -->
 
    zkquorum := ""
    client := gohbase.NewClient(
        zkquorum,
        gohbase.ZookeeperRoot("/hbase"),
    )
    hbaseClient := HbaseClient{<!-- -->
        client: client,
    }
    logrus.Info(hbaseClient)
}
 
// PutsByRowKey add RowKey
func (hb *HbaseClient) PutsByRowKey(ctx context.Context, table, rowKey string, values map[string]map[string][]byte) (err error) {<!-- -->
    putRequest, err := hrpc.NewPutStr(context.Background(), table, rowKey, values)
    if err != nil {<!-- -->
        return err
    }
    _, err = hb.client.Put(putRequest)
    if err != nil {<!-- -->
        return err
    }
    return nil
}
 
// UpdateByRowKey ...
func (hb *HbaseClient) UpdateByRowKey(ctx context.Context, table, rowKey string, values map[string]map[string][]byte) (err error) {<!-- -->
    putRequest, err := hrpc.NewPutStr(context.Background(), table, rowKey, values)
    if err != nil {<!-- -->
        return err
    }
    _, err = hb.client.Put(putRequest)
    if err != nil {<!-- -->
        return err
    }
    return
}
 
// GetsByRowKey ...
func (hb *HbaseClient) GetsByRowKey(ctx context.Context, table, rowKey string) (*hrpc.Result, error) {<!-- -->
    getRequest, err := hrpc.NewGetStr(context.Background(), table, rowKey)
    if err != nil {<!-- -->
        return nil, err
    }
    res, err := hb.client.Get(getRequest)
    if err != nil {<!-- -->
        return nil, err
    }
    return res, nil
}
 
// GetsByRowKeyCF ...
func (hb *HbaseClient) GetsByRowKeyCF(ctx context.Context, table, rowKey string, families map[string][]string) (*hrpc.Result, error) {<!-- -->
    getRequest, err := hrpc.NewGetStr(context.Background(), table, rowKey, hrpc.Families(families))
    if err != nil {<!-- -->
        return nil, err
    }
    res, err := hb.client.Get(getRequest)
    if err != nil {<!-- -->
        return nil, err
    }
    return res, nil
}
 
// DeleteByRowKey ...
func (hb *HbaseClient) DeleteByRowKey(ctx context.Context, table, rowKey string, value map[string]map[string][]byte) (err error) {<!-- -->
    delRequest, err := hrpc.NewDelStr(context.Background(), table, rowKey, value)
    if err != nil {<!-- -->
        return nil
    }
 
    _, err = hb.client.Delete(delRequest)
    if err != nil {<!-- -->
        return nil
    }
    return
}
 
// ScanByTable ...
func (hb *HbaseClient) ScanByTable(ctx context.Context, table, startRow, stopRow string) ([]*hrpc.Result, error) {<!-- -->
    scanRequest, err := hrpc.NewScanRangeStr(context.Background(), table, startRow, stopRow)
    if err != nil {<!-- -->
        return nil, err
    }
    scan := hb.client.Scan(scanRequest)
    var res []*hrpc.Result
    for {<!-- -->
        getRsp, err := scan.Next()
        if err == io.EOF || getRsp == nil {<!-- -->
            break
        }
        if err != nil {<!-- -->
            return nil, err
        }
        res = append(res, getRsp)
    }
    return res, nil
}

Reference

HBase Knowledge Manual
hbase series-Hbase hot issues, data skew and rowkey hash design