🗂️ HBase Advanced Usage
🗂️ HBase 高级用法
📊 HBase Table Characteristics
📊 HBase 表特性
“HBase table can scale to billions of rows and many numbers of columns based on your requirements.”
“HBase 表可以根据您的需求扩展到数十亿行和大量列。”
Key Features
主要特性
- Scalability: HBase supports the storage of vast amounts of data, capable of scaling to billions of rows.
- 可扩展性:HBase 支持存储海量数据,能够扩展到数十亿行。
- Throughput: High read and write throughput at low latency.
- 吞吐量:低延迟下的高读写吞吐量。
- Row Key: Each row contains a single indexed value known as the ==row key==.
- 行键:每行包含一个被称为==行键==的单一索引值。
🛠️ HBase Schema Design
🛠️ HBase 模式设计
Importance of Schema Design
模式设计的重要性
Understanding how to design tables, row keys, and column names is crucial to leveraging HBase’s advanced architecture effectively.
理解如何设计表、行键和列名对于有效利用 HBase 的高级架构至关重要。
Learning Objectives
学习目标
- HBase and Schema Design
- HBase 与模式设计
- Schema Creation
- 模式创建
- Table Schema Rules of Thumb
- 表模式经验法则
- RegionServer Sizing Rules of Thumb
- RegionServer 规模调整经验法则
- Rowkey Design
- 行键设计
- Supported Datatypes
- 支持的数据类型
- Joins
- 连接
- Constraints
- 约束
- Coprocessors
- 协处理器
📝 Schema Creation Process
📝 模式创建过程
Considerations for Schema
模式考虑因素
When designing a schema, consider the following:
在设计模式时,请考虑以下几点:
- Number of ==column families==.
- ==列族==的数量。
- Data allocation within each column family.
- 每个列族内的数据分配。
- Quantity and naming of columns.
- 列的数量和命名。
- Information stored in cells.
- 单元格中存储的信息。
- Number of versions for each cell.
- 每个单元格的版本数量。
- Structure and content of the ==row key==.
- ==行键==的结构和内容。
Table Creation Example
表创建示例
Tables can be created using the HBase Shell or the Java client API.
可以使用 HBase Shell 或 Java 客户端 API 创建表。
1 | Configuration config = HBaseConfiguration.create(); |
📈 Table Schema Rules of Thumb
📈 表模式经验法则
| Rule规则 | Description描述 |
|---|---|
| Aim for region sizes | Between 10 and 50 GB |
| 目标区域大小 | 介于 10 GB 和 50 GB 之间 |
| Cell size limit | No larger than 10 MB, or 50 MB if using mob |
| 单元格大小限制 | 不超过 10 MB,如果使用 mob 则不超过 50 MB |
| Column family count | 1 to 3 column families per table |
| 列族数量 | 每个表 1 到 3 个列族 |
| Region count | 50-100 regions for tables with 1 or 2 column families |
| 区域数量 | 对于有 1 或 2 个列族的表,50-100 个区域 |
| Column family names | Keep short; avoid overly descriptive names |
| 列族名称 | 保持简短;避免过于描述性的名称 |
| Write patterns | Monitor active regions and adjust accordingly |
| 写入模式 | 监控活动区域并相应调整 |
💻 RegionServer Sizing Rules of Thumb
💻 RegionServer 规模调整经验法则
Estimating Required Java Heap
估算所需的 Java 堆内存
To determine the ratio of raw disk space to required Java heap:
要确定原始磁盘空间与所需 Java 堆内存的比率:
$$
\text{Required Heap} = \frac{\text{Region Size}}{\text{Memstore Size}} \cdot \text{Replication Factor} \cdot \text{Heap Fraction for Memstores}
$$
$$
\text{所需堆内存} = \frac{\text{区域大小}}{\text{Memstore 大小}} \cdot \text{复制因子} \cdot \text{Memstore 的堆内存分数}
$$
Example Calculation
计算示例
Given parameters:
给定参数:
- Region Size: 10 GB
- 区域大小:10 GB
- Memstore Size: 128 MB
- Memstore 大小:128 MB
- HDFS Replication Factor: 3
- HDFS 复制因子:3
- Heap Fraction for Memstores: 40%
- Memstore 的堆内存分数:40%
Calculation:
计算:
$$\text{Required Heap} = \frac{10 \text{ GB}}{128 \text{ MB}} \cdot 3 \cdot 0.4 = 96 \text{ GB}$$
$$\text{所需堆内存} = \frac{10 \text{ GB}}{128 \text{ MB}} \cdot 3 \cdot 0.4 = 96 \text{ GB}$$
Key Considerations
主要考虑因素
- If serving 10 TB of disk space per region server, a Java heap of approximately 107 GB is necessary.
- 如果每个区域服务器服务 10 TB 的磁盘空间,则大约需要 107 GB 的 Java 堆内存。
- Adjustments can be made by increasing region size or decreasing memstore size based on workload.
- 可以根据工作负载通过增加区域大小或减小 memstore 大小来进行调整。
Additional Parameters
额外参数
- Region Size: Max of 20 GB (some claim up to 200 GB).
- 区域大小:最大 20 GB(有人声称可达 200 GB)。
- Memstore Size: Can be decreased based on write load.
- Memstore 大小:可以根据写入负载减小。
- Replication Factor: Increasing may not directly help but can utilize more disk space.
- 复制因子:增加可能不会直接有帮助,但可以利用更多的磁盘空间。
📚 Conclusion
📚 结论
HBase is a robust solution for real-time data processing, especially in scenarios like capturing tweets for analysis. Proper schema design and understanding of the architecture are critical for optimal performance and resource management.
HBase 是一个用于实时数据处理的强大解决方案,尤其适用于捕获推文进行分析等场景。正确的模式设计和对架构的理解对于优化性能和资源管理至关重要。
🗂️ HBase Advanced Usage
🗂️ HBase 高级用法
Column Families and Efficiency
列族与效率
- Column Family Optimization: HBase is not optimized for more than one column family. It is recommended to minimize the number of column families to improve efficiency.
- 列族优化:HBase 对超过一个列族的情况没有进行优化。建议尽量减少列族的数量以提高效率。
- Flushing and Compaction:
- 刷新和合并:
- Flushing and compaction operations are performed on one region.
- 刷新和合并操作是在一个区域上执行的。
- If a large data flush is triggered on one column family, adjacent column families are also flushed even if they contain less data.
- 如果在一个列族上触发了大量数据刷新,即使相邻的列族数据较少,它们也会被刷新。
- Compaction is triggered based on the number of files in a column family rather than file size, leading to redundant I/O operations.
- 合并是根据列族中的文件数量而不是文件大小触发的,这会导致冗余的 I/O 操作。
Recommendations
建议
| Recommendation建议 | Explanation解释 |
|---|---|
| Limit Column Families | Group columns with similar usage rates into one column family for efficiency. |
| 限制列族数量 | 将使用率相似的列分组到一个列族中以提高效率。 |
| Maintain Cardinality | Ensure that cardinalities among column families do not differ significantly to avoid inefficiency in scanning. |
| 保持基数 | 确保列族之间的基数没有显著差异,以避免扫描效率低下。 |
Rowkey Design and Hotspotting
行键设计与热点问题
- Rowkey Sorting: Rows are sorted lexicographically by rowkey, optimizing scanning and allowing related rows to be read together.
- 行键排序:行按行键进行字典序排序,优化了扫描,并允许将相关行一起读取。
- Hotspotting:
- 热点问题:
- Occurs when traffic (reads and writes) concentrates on a small number of nodes, degrading performance.
- 当流量(读写)集中在少数节点上时发生,导致性能下降。
- Designing data access patterns to distribute load evenly across clusters is crucial.
- 设计数据访问模式以在集群中均匀分配负载至关重要。
Methods to Prevent Hotspotting
防止热点问题的方法
Salting:
加盐 (Salting):
- Involves adding a random prefix to rowkeys to distribute them across regions.
- 涉及向行键添加随机前缀,以将其分布到不同的区域。
- Example:
- 示例:
- Without Salting:
- 不加盐:
foo0001,foo0002,foo0003,foo0004→ All go to a single region.foo0001,foo0002,foo0003,foo0004→ 全部进入单个区域。
- With Salting:
- 加盐后:
a-foo0001,b-foo0002,c-foo0003,d-foo0004→ Distributed across four regions.a-foo0001,b-foo0002,c-foo0003,d-foo0004→ 分布在四个区域。
Hashing:
哈希 (Hashing):
- Uses a one-way hash to generate a consistent prefix, allowing for predictable read operations.
- 使用单向哈希生成一致的前缀,从而实现可预测的读取操作。
- Provides a way to refactor rowkeys on the client side while ensuring balanced load distribution.
- 提供了一种在客户端重构行键的方法,同时确保负载均衡。
Reversing the Key:
反转键 (Reversing the Key):
- Reverses a fixed-length or numeric rowkey to randomize the order and prevent hotspotting.
- 反转固定长度或数字行键,以随机化顺序并防止热点问题。
Including Timestamps:
包含时间戳 (Including Timestamps):
- Appending timestamps to rowkeys aids in retrieving data based on storage time.
- 将时间戳附加到行键有助于根据存储时间检索数据。
Rowkey Constraints
行键约束
Rowkey Uniqueness:
行键唯一性:
- Rowkeys are unique within a column family; the same rowkey can exist in different column families without conflict.
- 在一个列族内,行键是唯一的;同一个行键可以存在于不同的列族中而不会冲突。
Modifications:
修改:
- Rowkeys cannot be modified directly; to change a rowkey, the original must be deleted and a new one inserted.
- 行键不能直接修改;要更改行键,必须删除原始行键并插入一个新行键。
Data Types in HBase
HBase 中的数据类型
- Byte Arrays: HBase cells store data as byte arrays, which are encoded/decoded during put and get operations.
- 字节数组:HBase 单元格将数据存储为字节数组,在 put 和 get 操作期间进行编码/解码。
- Counters: HBase supports atomic increments of numbers, allowing for efficient counting operations.
- 计数器:HBase 支持数字的原子性增量,从而实现高效的计数操作。
Joins and Secondary Indexes
连接和二级索引
- Joins:
- 连接 (Joins):
- HBase does not support traditional SQL joins.
- HBase 不支持传统的 SQL 连接。
- Equivalent functionality must be implemented at the application level, either by denormalizing data or using lookup tables.
- 必须在应用层实现等效功能,可以通过反规范化数据或使用查找表来实现。
Secondary Indexes
二级索引
- Provide an alternative access path to data beyond the primary row key.
- 提供除主行键之外的数据访问路径。
- Allow point lookups and range scans based on indexed columns.
- 允许基于索引列进行点查找和范围扫描。
- Considerations for use:
- 使用时的考虑因素:
- Number of users
- 用户数量
- Data size and arrival rate
- 数据大小和到达率
- Flexibility of reporting requirements
- 报告要求的灵活性
- Desired query execution speed
- 期望的查询执行速度
Performance Considerations
性能考虑
- Secondary indexes require additional cluster space and processing power, providing a trade-off between performance and resource usage.
- 二级索引需要额外的集群空间和处理能力,这在性能和资源使用之间提供了一个权衡。
- Synchronization on counters occurs on the RegionServer rather than the client, optimizing performance.
- 计数器的同步发生在 RegionServer 上而不是客户端,从而优化了性能。
Filtering Queries
过滤查询
- Client Request Filters can be used to avoid creating secondary indexes, but full scans on large tables should be avoided.
- 可以使用客户端请求过滤器来避免创建二级索引,但应避免对大表进行全表扫描。
🗃️ Secondary Index Strategies
🗃️ 二级索引策略
Periodic-Update Secondary Index
周期性更新的二级索引
- A secondary index can be created in another table and is updated through a MapReduce job.
- 二级索引可以在另一个表中创建,并通过 MapReduce 作业进行更新。
- This job may be executed intra-day but might still be out of sync with the main data table depending on the load strategy.
- 这个作业可以在日内执行,但根据加载策略,可能仍会与主数据表不同步。
Dual-Write Secondary Index
双写二级索引
- In this strategy, the secondary index is built while data is being published to the cluster.
- 在此策略中,二级索引是在数据发布到集群的同时构建的。
- The process involves writing to both the data table and the index table simultaneously.
- 该过程涉及同时写入数据表和索引表。
- If a data table already exists, bootstrapping will be necessary for the secondary index, typically accomplished via a MapReduce job.
- 如果数据表已存在,则需要为二级索引进行引导 (bootstrapping),这通常通过 MapReduce 作业完成。
Summary Tables
汇总表
- Used for extensive time ranges (e.g., year-long reports) and voluminous data.
- 用于大时间范围(例如,年度报告)和海量数据。
- Summary tables are generated using MapReduce jobs into a separate table.
- 汇总表是使用 MapReduce 作业生成到单独的表中的。
⚖️ Constraints in HBase
⚖️ HBase 中的约束
“Constraints in HBase are used to enforce business rules for table attributes, ensuring data integrity.”
“HBase 中的约束用于强制执行表属性的业务规则,确保数据完整性。”
General Information
一般信息
- HBase supports constraints similar to traditional SQL databases.
- HBase 支持类似于传统 SQL 数据库的约束。
- Recommended for enforcing business rules (e.g., ensuring values range from 1-10).
- 推荐用于强制执行业务规则(例如,确保值在 1-10 的范围内)。
- Using constraints for referential integrity is not recommended as it can significantly lower the write throughput.
- 不建议使用约束来实现引用完整性,因为它会显著降低写入吞吐量。
Constraint Class
Constraint 类
- The class is found in
java.lang.Objectandorg.apache.hadoop.hbase.constraint.Constraints. - 该类位于
java.lang.Object和org.apache.hadoop.hbase.constraint.Constraints中。
Properties of Constraint Class
Constraint 类的属性
| Property属性 | Description描述 |
|---|---|
private static Pattern |
CONSTRAINT_HTD_ATTR_KEY_PATTERN |
private static String |
CONSTRAINT_HTD_KEY_PREFIX |
private static Comparator<Constraint> |
constraintComparator |
private static String |
COUNTER_KEY |
private static int |
DEFAULT_PRIORITY |
private static String |
ENABLED_KEY |
private static long |
MIN_PRIORITY |
private static String |
PRIORITY_KEY |
private static long |
UNSET_PRIORITY |
Commonly Used Methods of the Constraint Class
Constraint 类的常用方法
| Method方法 | Description描述 |
|---|---|
add(HTableDescriptor desc, Class<? extends Constraint>... constraints) |
Adds configuration-less constraints to the table. |
add(HTableDescriptor desc, Class<? extends Constraint>... constraints) |
向表中添加无配置的约束。 |
add(HTableDescriptor desc, Class<? extends Constraint> constraint, org.apache.hadoop.conf.Configuration conf) |
Adds a Constraint with configuration. |
add(HTableDescriptor desc, Class<? extends Constraint> constraint, org.apache.hadoop.conf.Configuration conf) |
添加带配置的约束。 |
disable(HTableDescriptor desc) |
Turns off processing constraints for a table. |
disable(HTableDescriptor desc) |
关闭表的约束处理。 |
disableConstraint(HTableDescriptor desc, Class<? extends Constraint> clazz) |
Disables the given Constraint. |
disableConstraint(HTableDescriptor desc, Class<? extends Constraint> clazz) |
禁用给定的约束。 |
enable(HTableDescriptor desc) |
Enables constraints on a table. |
enable(HTableDescriptor desc) |
在表上启用约束。 |
enableConstraint(HTableDescriptor desc, Class<? extends Constraint> clazz) |
Enables the given Constraint. |
enableConstraint(HTableDescriptor desc, Class<? extends Constraint> clazz) |
启用给定的约束。 |
enabled(HTableDescriptor desc, Class<? extends Constraint> clazz) |
Checks if the given constraint is enabled. |
enabled(HTableDescriptor desc, Class<? extends Constraint> clazz) |
检查给定的约束是否已启用。 |
getConstraints(TableDescriptor desc, ClassLoader classloader) |
Gets constraints stored in the table descriptor. |
getConstraints(TableDescriptor desc, ClassLoader classloader) |
获取存储在表描述符中的约束。 |
remove(HTableDescriptor desc) |
Removes all Constraints from the table. |
remove(HTableDescriptor desc) |
从表中删除所有约束。 |
remove(HTableDescriptor desc, Class<? extends Constraint> clazz) |
Removes a specific constraint from the table. |
remove(HTableDescriptor desc, Class<? extends Constraint> clazz) |
从表中删除特定的约束。 |
🔍 Coprocessors in HBase
🔍 HBase 中的协处理器
Introduction to Coprocessors
协处理器简介
- Coprocessor is a framework in HBase that allows running custom code on a Region Server.
- 协处理器是 HBase 中的一个框架,允许在 Region Server 上运行自定义代码。
- It provides a way to perform computations directly on data stored in HBase, which can improve performance by reducing data transfer overhead.
- 它提供了一种直接在 HBase 中存储的数据上执行计算的方法,可以通过减少数据传输开销来提高性能。
Types of Coprocessors
协处理器的类型
Coprocessors can be categorized into two types:
协处理器可分为两类:
- Observer Coprocessor: Similar to triggers in traditional databases.
- 观察者协处理器 (Observer Coprocessor):类似于传统数据库中的触发器。
- Endpoint Coprocessor: Used for executing user-defined functions.
- 端点协处理器 (Endpoint Coprocessor):用于执行用户定义的函数。
📊 Observer Coprocessor
📊 观察者协处理器
Functionality
功能
- Allows insertion of user code by overriding methods provided by the coprocessor framework.
- 允许通过重写协处理器框架提供的方法来插入用户代码。
- Callbacks are executed from core HBase code when certain events occur.
- 当某些事件发生时,核心 HBase 代码会执行回调。
Example: Pre-Put Operation
示例:Pre-Put 操作
To execute code before a put operation, override the following method:
要在 put 操作之前执行代码,请重写以下方法:
1 | public void prePut(final ObserverContext e, final Put put, final WALEdit edit, final Durability durability) throws IOException { |
RegionObserver Interface Methods
RegionObserver 接口方法
| Method方法 | Description描述 |
|---|---|
default void preGetOp(ObserverContext<RegionCoprocessorEnvironment> c, Get get, List<Cell> result) |
Called before a Get operation. |
default void preGetOp(ObserverContext<RegionCoprocessorEnvironment> c, Get get, List<Cell> result) |
在 Get 操作之前调用。 |
default void postGetOp(ObserverContext<RegionCoprocessorEnvironment> c, Get get, List<Cell> result) |
Called after a Get operation. |
default void postGetOp(ObserverContext<RegionCoprocessorEnvironment> c, Get get, List<Cell> result) |
在 Get 操作之后调用。 |
default boolean preExists(ObserverContext<RegionCoprocessorEnvironment> c, Get get, boolean exists) |
Called before checking existence with Get. |
default boolean preExists(ObserverContext<RegionCoprocessorEnvironment> c, Get get, boolean exists) |
在使用 Get 检查存在性之前调用。 |
default boolean postExists(ObserverContext<RegionCoprocessorEnvironment> c, Get get, boolean exists) |
Called after checking existence with Get. |
default boolean postExists(ObserverContext<RegionCoprocessorEnvironment> c, Get get, boolean exists) |
在使用 Get 检查存在性之后调用。 |
default void prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, Durability durability) |
Called before storing a value. |
default void prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, Durability durability) |
在存储值之前调用。 |
default void postPut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, Durability durability) |
Called after storing a value. |
default void postPut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit, Durability durability) |
在存储值之后调用。 |
default void preDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete, WALEdit edit, Durability durability) |
Called before deleting a value. |
default void preDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete, WALEdit edit, Durability durability) |
在删除值之前调用。 |
default void postDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete, WALEdit edit, Durability durability) |
Called after deleting a value. |
default void postDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete, WALEdit edit, Durability durability) |
在删除值之后调用。 |
default void preScannerOpen(ObserverContext<RegionCoprocessorEnvironment> c, Scan scan) |
Called before opening a new scanner. |
default void preScannerOpen(ObserverContext<RegionCoprocessorEnvironment> c, Scan scan) |
在打开新扫描器之前调用。 |
default void postScannerOpen(ObserverContext<RegionCoprocessorEnvironment> c, Scan scan, RegionScanner s) |
Called after opening a new scanner. |
default void postScannerOpen(ObserverContext<RegionCoprocessorEnvironment> c, Scan scan, RegionScanner s) |
在打开新扫描器之后调用。 |
default boolean preScannerNext(ObserverContext<RegionCoprocessorEnvironment> c, InternalScanner s, List<Result> result, int limit, boolean hasNext) |
Called before fetching the next row on a scanner. |
default boolean preScannerNext(ObserverContext<RegionCoprocessorEnvironment> c, InternalScanner s, List<Result> result, int limit, boolean hasNext) |
在扫描器上获取下一行之前调用。 |
default boolean postScannerNext(ObserverContext<RegionCoprocessorEnvironment> c, InternalScanner s, List<Result> result, int limit, boolean hasNext) |
Called after fetching the next row on a scanner. |
default boolean postScannerNext(ObserverContext<RegionCoprocessorEnvironment> c, InternalScanner s, List<Result> result, int limit, boolean hasNext) |
在扫描器上获取下一行之后调用。 |
🏗️ HBase Coprocessors and Observers
🏗️ HBase 协处理器和观察者
Coprocessors Overview
协处理器概述
“A coprocessor is executed when an event occurs. This type of coprocessor is known as an Observer.”
“当事件发生时,协处理器会被执行。这种类型的协处理器被称为观察者 (Observer)。”
Types of Coprocessors
协处理器的类型
Coprocessors in HBase are broadly divided into two categories:
HBase 中的协处理器大致分为两类:
| Type类型 | Description描述 |
|---|---|
| Observer | Similar to triggers in conventional databases; allows user code insertion by overriding methods. |
| 观察者 (Observer) | 类似于传统数据库中的触发器;允许通过重写方法插入用户代码。 |
| Endpoint | Can be invoked at any time from the client and executed remotely at target regions. |
| 端点 (Endpoint) | 可以随时从客户端调用,并在目标区域远程执行。 |
RegionObserver Interface Methods
RegionObserver 接口方法
The RegionObserver interface provides methods to intercept operations performed on regions. Key methods include:
RegionObserver 接口提供了拦截在区域上执行的操作的方法。关键方法包括:
| Method Name方法名称 | Description描述 |
|---|---|
preScannerClose |
Called before the client closes a scanner. |
preScannerClose |
在客户端关闭扫描器之前调用。 |
postScannerClose |
Called after the client closes a scanner. |
postScannerClose |
在客户端关闭扫描器之后调用。 |
preCheckAndPut |
Called before checkAndPut operation. |
preCheckAndPut |
在 checkAndPut 操作之前调用。 |
postCheckAndPut |
Called after checkAndPut operation. |
postCheckAndPut |
在 checkAndPut 操作之后调用。 |
preCheckAndDelete |
Called before checkAndDelete operation. |
preCheckAndDelete |
在 checkAndDelete 操作之前调用。 |
postCheckAndDelete |
Called after checkAndDelete operation. |
postCheckAndDelete |
在 checkAndDelete 操作之后调用。 |
Example Implementation: Access Control Coprocessor
实现示例:访问控制协处理器
An example of a simple RegionObserver that implements access control:
一个实现访问控制的简单 RegionObserver 示例:
1 | package org.apache.hadoop.hbase.coprocessor; |
MasterObserver Interface
MasterObserver 接口
A MasterObserver provides hooks for DDL-type operations such as create, delete, and modify table actions. It runs within the context of the HBase master.
MasterObserver 为 DDL 类型的操作(如创建、删除和修改表操作)提供钩子。它在 HBase Master 的上下文中运行。
- Multiple observers of a given type can be loaded and chained to execute sequentially based on assigned priority.
- 可以加载给定类型的多个观察者,并根据分配的优先级将它们链接起来顺序执行。
- For example, using the
postDeleteTable()hook to delete secondary indexes when a primary table is deleted. - 例如,在删除主表时使用
postDeleteTable()钩子来删除二级索引。
Example of Deleting a Namespace
删除命名空间的示例
1 | public void postDeleteNamespace(final String namespaceName) throws IOException { |
Implementing an Endpoint
实现端点
Endpoints can be invoked from the client and executed remotely. Steps to implement an endpoint:
端点可以从客户端调用并远程执行。实现端点的步骤:
- Define the coprocessor service and related messages in a
.protofile. - 在
.proto文件中定义协处理器服务和相关消息。 - Run the
protoccommand to generate the code. - 运行
protoc命令生成代码。 - Implement the generated protobuf Service interface and the
CoprocessorServiceinterface. - 实现生成的 protobuf 服务接口和
CoprocessorService接口。
Client Invocation
客户端调用
The client calls the new HTable.coprocessorService() methods to perform the endpoint RPCs.
客户端调用新的 HTable.coprocessorService() 方法来执行端点 RPC。
Important Concepts and Definitions
重要概念和定义
- Access Control: A mechanism that checks user permissions for operations like Get, Put, Delete, and Scan.
- 访问控制:一种检查用户对 Get、Put、Delete 和 Scan 等操作权限的机制。
- Hotspotting: Occurs when traffic is concentrated on one or few nodes in a cluster. To prevent hotspotting, methods include salting, hashing, reversing a key, and timestamps.
- 热点问题:当流量集中在集群中的一个或少数节点上时发生。防止热点问题的方法包括加盐、哈希、反转键和时间戳。
- Secondary Indexes: Provide an alternative access path to data beyond the primary row key access.
- 二级索引:提供除主行键访问之外的替代数据访问路径。
Key Takeaways
关键要点
- Coprocessor Framework: Allows running custom code on Region Servers, enhancing the flexibility and functionality of HBase.
- 协处理器框架:允许在 Region Server 上运行自定义代码,增强了 HBase 的灵活性和功能性。
- Design Considerations: Effective schema design and understanding how HBase handles data access patterns is crucial for optimal performance.
- 设计考虑:有效的模式设计和理解 HBase 如何处理数据访问模式对于优化性能至关重要。