Post2

HAWQ/HDB and Hadoop with Hive and HBase

Posted by

Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis.

HBase: Apache HBase™ is the Hadoop database, a distributed, scalable, big data store

Hawqhttp://hawq.incubator.apache.org/

PXFPXF is an extensible framework that allows HAWQ to query external system data

Let’s learn Query federation

This topic describes how to access Hive data using PXF. Link

Previously, in order to query Hive tables using HAWQ and PXF, you needed to create an external table in PXF that described the target table’s Hive metadata. Since HAWQ is now integrated with HCatalog, HAWQ can use metadata stored in HCatalog instead of external tables created for PXF. HCatalog is built on top of the Hive metastore and incorporates Hive’s DDL. This provides several advantages:

  • You do not need to know the table schema of your Hive tables
  • You do not need to manually enter information about Hive table location or format
  • If Hive table metadata changes, HCatalog provides updated metadata. This is in contrast to the use of static external PXF tables to define Hive table metadata for HAWQ.

hawq

  1. HAWQ retrieves table metadata from HCatalog using PXF.
  2. HAWQ creates in-memory catalog tables from the retrieved metadata. If a table is referenced multiple times in a transaction, HAWQ uses its in-memory metadata to reduce external calls to HCatalog.
  3. PXF queries Hive using table metadata that is stored in the HAWQ in-memory catalog tables. Table metadata is dropped at the end of the transaction.

Demo

Tools used

Hive,Hawq,Zeppelin

 

HBase tables 

Follow this to create hbase tables

hawq1

Create table in HAWQ to access HBASE table

Note: Port is 51200 not 50070

hawq2

Links

Gist

PXF docs

Must see this

Zeppelin interpreter settings

zep1

zep2

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *