maexadata - Welcome to the world of mega-exabytes!!!


Sep 23, 2014

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis.

Using Hadoop was not easy for end users, especially for the ones who were not familiar with MapReduce framework. End users had to write map/reduce programs for simple tasks like getting raw counts or averages. Hive was created to make it possible for analysts with strong SQL skills (but meager Java programming skills) to run queries on the huge volumes of data to extract patterns and meaningful information. It provides an SQL-like language called HiveQL while maintaining full support for map/reduce. In short, a Hive query is converted to MapReduce tasks.

The main building blocks of Hive are –
1. Metastore stores the system catalog and metadata about tables, columns, partitions, etc.
2. Driver manages the lifecycle of a HiveQL statement as it moves through Hive
3. Query Compiler compiles HiveQL into a directed acyclic graph for MapReduce tasks
4. Execution Engine executes the tasks produced by the compiler in proper dependency order
5. HiveServer provides a Thrift interface and a JDBC / ODBC server


Go Back