🏗️ HBase Architecture & Features
🏗️ HBase 架构与特性
Overview of HBase
HBase 概述
HBase is a NoSQL database that provides ==ACID== (Atomicity, Consistency, Isolation, Durability) compliance, making it suitable for high-scale, real-time applications.
HBase 是一个提供 ==ACID== (原子性、一致性、隔离性、持久性) 合规性的 NoSQL 数据库,使其适用于大规模、实时应用。
It is schema-less, allowing the addition of new data without a predefined model.
它是无模式的,允许在没有预定义模型的情况下添加新数据。
HBase offers database-like access to ==Hadoop-scale storage==, enabling efficient read and write operations on subsets of data without scanning the entire dataset.
HBase 提供对 ==Hadoop 规模存储== 的类数据库访问,能够对数据子集进行高效的读写操作,而无需扫描整个数据集。
Key Objectives
关键目标
- Architectural Overview of HBase
- HBase 架构概述
- Three Major Components of HBase
- HBase 的三大主要组件
- HBase Features
- HBase 特性
- Consistency
- 一致性
- Atomic Read and Write
- 原子读写
- High Availability
- 高可用性
- Real-Time Processing
- 实时处理
🏗️ Architectural Overview of HBase
🏗️ HBase 架构概述
HBase architecture consists of several key components:
HBase 架构由几个关键组件组成:
- HMaster
- HBase Region Server
- Regions and Zookeeper
Components of HBase Architecture
HBase 架构组件
| Component |
Description |
| HMaster |
Manages Region Servers, performs DDL operations, assigns regions, monitors servers, and facilitates recovery. |
| Region Server |
Handles data-related operations for multiple regions and communicates with clients. |
| Zookeeper |
Provides configuration management, naming, and synchronization services for HBase. |
| 组件 |
描述 |
| HMaster |
管理 Region Server,执行 DDL 操作,分配 region,监控服务器并促进恢复。 |
| Region Server |
处理多个 region 的数据相关操作,并与客户端通信。 |
| Zookeeper |
为 HBase 提供配置管理、命名和同步服务。 |
HMaster Role
HMaster 角色
- Performs Data Definition Language (DDL) operations (create/delete tables).
- 执行数据定义语言 (DDL) 操作(创建/删除表)。
- Assigns and reassigns regions to Region Servers.
- 将 region 分配和重新分配给 Region Server。
- Monitors Region Server instances and coordinates recovery activities.
- 监控 Region Server 实例并协调恢复活动。
Zookeeper Role
Zookeeper 角色
- Maintains configuration information and assists HMaster in managing the environment.
- 维护配置信息并协助 HMaster 管理环境。
- Uses ephemeral nodes for tracking available Region Servers and monitoring failures.
- 使用临时节点来跟踪可用的 Region Server 并监控故障。
🌐 HBase Region Server
🌐 HBase Region Server
- HBase tables are divided into ==Regions== by row key range.
- HBase 表按行键范围划分为 ==Regions==。
- Regions are the basic elements for distributing tables and consist of column families.
- Region 是分布表的基本元素,由列族组成。
- A Region Server runs on ==HDFS DataNode== and is responsible for read/write operations.
- Region Server 运行在 ==HDFS DataNode== 上,负责读/写操作。
Region Server Responsibilities
Region Server 职责
| Responsibility |
Description |
| Data Communication |
Communicates with clients and manages data-related operations. |
| Read/Write Handling |
Handles read/write requests for all regions under its management. |
| Region Size Management |
Determines region sizes based on defined thresholds. |
| 职责 |
描述 |
| 数据通信 |
与客户端通信并管理数据相关操作。 |
| 读/写处理 |
处理其管理下所有 region 的读/写请求。 |
| Region 大小管理 |
根据定义的阈值确定 region 大小。 |
🛠️ HBase Features
🛠️ HBase 特性
Consistency
一致性
- HBase supports a strong consistency model where reads and writes go through a single server, ensuring serialized updates.
- HBase 支持强一致性模型,读写都通过单个服务器进行,确保序列化更新。
- It can handle atomic increment operations with a special “counter” datatype, useful for counting operations.
- 它可以通过特殊的“计数器”数据类型处理原子增量操作,这对于计数操作很有用。
Atomic Read and Write
原子读写
- Atomicity ensures that operations are all-or-nothing: if one part fails, the entire operation fails, maintaining system integrity.
- 原子性 确保操作是“全有或全无”的:如果一部分失败,整个操作都将失败,从而保持系统完整性。
Sharding
分片
- HBase tables consist of regions that can be automatically or manually split into smaller subregions, facilitating horizontal scaling.
- HBase 表由 region 组成,这些 region 可以自动或手动拆分为更小的子 region,从而便于水平扩展。
- Auto sharding allows for dynamic division of tables into manageable parts when they exceed a certain size.
- 自动分片允许在表超过一定大小时将其动态划分为可管理的部分。
High Availability
高可用性
- HBase achieves high availability through ==region replication==, where multiple replicas of regions can exist on different Region Servers.
- HBase 通过 ==region 复制== 实现高可用性,即 region 的多个副本可以存在于不同的 Region Server 上。
- By default, region replication is set to 1, but can be increased to improve fault tolerance.
- 默认情况下,region 复制设置为 1,但可以增加以提高容错能力。
Real-Time Processing
实时处理
- HBase supports ==block cache== and ==Bloom filters== for efficient real-time query processing.
- HBase 支持 ==块缓存== 和 ==布隆过滤器== 以实现高效的实时查询处理。
- Block cache improves read performance by caching data blocks in memory, reducing access time for subsequent reads.
- 块缓存通过在内存中缓存数据块来提高读取性能,减少后续读取的访问时间。
MemStore
MemStore
- MemStore acts as a write cache, temporarily storing incoming data before it is committed to disk.
- MemStore 充当写缓存,在数据提交到磁盘之前临时存储传入的数据。
Summary of HBase Features
HBase 特性总结
| Feature |
Description |
| Consistency |
Ensures strong consistency with serialized updates. |
| Atomic Read/Write |
Provides atomic operations for data integrity. |
| Sharding |
Dynamically splits and distributes regions to manage large datasets. |
| High Availability |
Uses region replication to ensure operational performance and fault tolerance. |
| Real-Time Processing |
Supports efficient querying through caching mechanisms. |
| 特性 |
描述 |
| 一致性 |
通过序列化更新确保强一致性。 |
| 原子读写 |
提供原子操作以保证数据完整性。 |
| 分片 |
动态拆分和分发 region 以管理大型数据集。 |
| 高可用性 |
使用 region 复制来确保操作性能和容错能力。 |
| 实时处理 |
通过缓存机制支持高效查询。 |
🗄️ HBase Architecture & Features
🗄️ HBase 架构与特性
Block Cache
块缓存
“Block cache helps in reducing disk I/O for retrieving data.”
“块缓存有助于减少检索数据时的磁盘 I/O。”
- The ==block cache== is a mechanism that allows data in the same block to be served quickly, reducing the need for disk I/O.
- ==块缓存== 是一种允许快速提供同一块中数据的机制,减少了磁盘 I/O 的需求。
- It is configurable at the ==table’s column family level==, meaning different column families can have different cache priorities or even disable the block cache entirely.
- 它可以在 ==表的列族级别== 进行配置,这意味着不同的列族可以有不同的缓存优先级,甚至可以完全禁用块缓存。
- Applications utilize this cache mechanism to accommodate various data sizes and access patterns.
- 应用程序利用这种缓存机制来适应各种数据大小和访问模式。
Bloom Filter
布隆过滤器
“Bloom Filters provide an in-memory structure to reduce disk reads to only the files likely to contain that Row.”
“布隆过滤器提供一种内存结构,将磁盘读取减少到只读取那些可能包含该行的文件。”
- A ==Bloom Filter== is an efficient mechanism used to test whether a ==StoreFile== contains a specific row or row-column cell.
- ==布隆过滤器== 是一种用于测试 ==StoreFile== 是否包含特定行或行列单元的高效机制。
- Without Bloom filters, the only method to find a row key in a StoreFile is by checking the StoreFile’s block index, which stores the start row key of each block.
- 如果没有布隆过滤器,在 StoreFile 中查找行键的唯一方法是检查 StoreFile 的块索引,该索引存储了每个块的起始行键。
- Bloom Filters act as an ==in-memory index==, significantly reducing disk reads by narrowing down the search to files that are likely to contain the specified row.
- 布隆过滤器充当 ==内存索引==,通过将搜索范围缩小到可能包含指定行的文件,从而显著减少磁盘读取。
HBase Features
HBase 特性
HBase provides ==atomic read and write== operations on a row level, ensuring data consistency and reliability.
HBase 提供行级别的 ==原子读写== 操作,确保数据的一致性和可靠性。
Features of HBase Region Server
HBase Region Server 的特性
| Feature |
Description |
| Consistency |
Guarantees that data will not be lost or corrupted. |
| High Availability |
Ensures that the system remains operational even during failures. |
| Sharding |
Distributes data across multiple servers to improve performance. |
| All of the above |
Region servers provide all these features. |
| 特性 |
描述 |
| 一致性 |
保证数据不会丢失或损坏。 |
| 高可用性 |
确保系统在发生故障时仍能保持运行。 |
| 分片 |
将数据分布到多个服务器以提高性能。 |
| 以上所有 |
Region Server 提供所有这些特性。 |
Key Components of HBase Architecture
HBase 架构的关键组件
- HMaster Server: Manages the overall HBase operations and performs DDL operations such as creating and deleting tables.
- HMaster 服务器:管理 HBase 的整体操作,并执行如创建和删除表之类的 DDL 操作。
- HBase Region Server: Handles read and write requests for the regions it manages.
- HBase Region Server:处理其所管理 region 的读写请求。
- Regions: The basic building blocks of HBase clusters, consisting of a distribution of tables and containing column families.
- Regions:HBase 集群的基本构建块,由表的分布组成并包含列族。
- Zookeeper: Provides services like maintaining configuration information, naming, and distributed synchronization.
- Zookeeper:提供维护配置信息、命名和分布式同步等服务。
Storage Structure
存储结构
- Data in HBase is stored in a block file on HDFS in the form of a ==storefile (HFile)== binary stream.
- HBase 中的数据以 ==storefile (HFile)== 二进制流的形式存储在 HDFS 的块文件中。
- HBase can operate in a multiple master setup, with only a single active master at any time.
- HBase 可以在多主节点设置下运行,但在任何时候只有一个活动的主节点。
Summary of HBase Features
HBase 特性总结
- Low-latency random reads and writes operations on top of HDFS.
- 基于 HDFS 的低延迟随机读写操作。
- HBase tables are partitioned into multiple regions, each storing multiple rows.
- HBase 表被分区为多个 region,每个 region 存储多行。
- Key features include:
- 主要特性包括:
- Consistency
- 一致性
- Atomic Read and Write
- 原子读写
- Sharding
- 分片
- High Availability
- 高可用性
- Real-Time Processing
- 实时处理