maexadata - Welcome to the world of mega-exabytes!!!


Sep 23, 2014

As HDFS is an append-only filesystem, the need for modifications at arbitrary offsets arose quickly. HBase is the Hadoop application to use when you require real-time read/write random-access to very large datasets. It is a distributed column-oriented database built on top of HDFS.

HBase is not relational and does not support SQL, but given the proper problem space, it is able to do what an RDBMS cannot: host very large, sparsely populated tables on clusters made from commodity hardware.

HBase is modeled with an HBase master node orchestrating a cluster of one or more regionserver slaves. The HBase master is responsible for bootstrapping a virgin install, for assigning regions to registered regionservers, and for recovering regionserver failures. The regionservers carry zero or more regions and field client read/write requests.

HBase depends on ZooKeeper and by default it manages a ZooKeeper instance as the authority on cluster state. HBase hosts vital information such as the location of the root catalog table and the address of the current cluster Master. Assignment of regions is mediated via ZooKeeper in case participating servers crash mid-assignment.


Go Back