博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
HBase学习笔记 - 基础篇
阅读量:4094 次
发布时间:2019-05-25

本文共 3082 字,大约阅读时间需要 10 分钟。

前言:本篇主要梳理了HBase的基础知识,不涉及环境的搭建及原理的深入探讨,算是阶段性学习的总结。文中观点大部分源于HBase官方文档,并且为了不曲解其原意,不做多余的中文翻译。更多详情请参照HBase官方文档:

 

一、什么是HBase?

HBase(Hadoop Database, 以下简称HBase)是Apache Hadoop项目的子项目,是一个开源的、分布式的、面向列的非关系型数据库。它参考了Google的BigTable建模,在HDFS上提供了类似于BigTable的能力。

附注:准确来说,应该是面向列族的。

 

二、HBase的应用场景?

1.概述

  • Make sure you have enough data. 若数据达数亿乃至数十亿行,则可以考虑HBase。
  • Make sure you can live without all the extra features that an RDBMS provides(e.g., typed columns, secondary indexes, transactions, advanced query languages, etc.)
  • Make sure you have enough hardware. HDFS doesn't do well with anything less than 5 DataNodes, plus a NameNode.

2.应用实例

  • 存储云服务数据,如用户的联系人、短信、便签等数据。
  • 存储用户的点击流,通过分析点击流,可精准化用户运营。

三、HBase的特性

  • Strongly consistent reads/writes - 强一致性的读写
  • Automatic sharding - 自动分片
  • Automatic RegionServer failover - 自动RegionServer故障转移
  • Hadoop/HDFS Integration
  • MapReduce
  • Java Client API/Thrift/REST API
  • Block Cache and Bloom Filters
  • Operational Management

四、HBase与Hadoop/HDFS的区别?

  • Hadoop/HDFS:HDFS是一个适用于大文件存储的分布式文件系统,不支持在文件中快速查找记录。
  • HBase:HBase基于HDFS,利用了HDFS的大文件存储优势,同时支持在大表中快速查找和更新记录。

五、HBase的数据模型

1.命名空间(Namespace)

  • A namespace is a logical grouping of tables.
  • There are two predefined special namespaces.

hbase - system namespace, used to contain HBase internal tables.

default - tables with no explicit specified namespace will automatically fall into this namespace.

2.表(Table)

  • An HBase table consists of multiple rows.

3.行(Row)

  • A row in HBase consists of a row key and one or more columns with values associated with them.
  • Rows are sorted alphabetically by the row key as they are stored.

4.列(Column)

  • A column in HBase consists of  a column family and a column qualifier, which are delimited by a ':' character.

5.列族(Column Family)

  • Column families physically colocate a set of columns and their values, often for performance reasons.
  • Column families must be declared up front at schema definition time.
  • The column family prefix must be composed of printable characters.
  • HBase currently does not do well with anything above two or three column families.

6.列描述符(Column Qualifier)

  • A column qualifier is added to a column family to provide the index for a given piece of data.
  • Column qualifiers are mutable and may differ greatly between rows.

7.单元格(Cell)

  • A cell is a combination of row, column family, and column qualifier, and contains a value and a timestamp, which represents the value's version.
  • A {row, column, version} tuple exactly specifies a cell in HBase. 

8.时间戳(Timestamp)

  • A timestamp is written alongside each value, and is the identifier for a given version of a value.
  • By default, the timestamp represents the time on the RegionServer when the data was written, but you can specify a different timestamp value when you put data into the cell.
  • In physical view, timestamp are stored in descending order.

六、HBase的逻辑视图(表格视图&多维Map视图)

表描述:

  • 表名:webtable
  • 行:2行,RowKey为com.cnn.www、com.example.www
  • 列族:3个,contents、anchor、people
  • 对于RowKey=com.cnn.www,列族anchor包含两个列(anchor:cssnsi.com、anchor:my.look.ca),列族contents包含一个列(contents:html)
  • 对于RowKey=com.cnn.www,具有5个数据版本;对于RowKey=com.example.www,具有一个数据版本。

七、HBase的物理视图(列族存储)

 

 

 

 

 

 

 

转载地址:http://piaii.baihongyu.com/

你可能感兴趣的文章
[JAVA学习笔记-75]关于CAS
查看>>
[JAVA学习笔记-76]volatile的原子性与可见性
查看>>
[JAVA学习笔记-77]关于BlockingQueue
查看>>
[JAVA学习笔记-78]lockInterruptibly
查看>>
[JAVA学习笔记-80]Java service wrapper入门
查看>>
[JAVA学习笔记-87]CompletionService简单分析
查看>>
[JAVA学习笔记-88]DelayQueue实现Leader-Follower pattern
查看>>
[JAVA学习笔记-95]REST框架浅析
查看>>
[JAVA学习笔记-96]ThreadLocal
查看>>
[JAVA学习笔记-97]ActiveObject模式的Scheduler的关键实现
查看>>
【代码积累-1】ActiveObject
查看>>
【代码积累-2】binary search
查看>>
【代码积累-3】bubble sort
查看>>
【代码积累-4】cal MD5
查看>>
【代码积累】condition of lock
查看>>
【代码积累】countdown latch
查看>>
【代码积累】semaphore
查看>>
【代码积累】Date split
查看>>
【代码积累】Enum
查看>>
【代码积累】Event handling framrwork
查看>>