Hadoop Cluster Verification (HCV)

Posted by

Verification scripts basically composed of idea to run a smoke test against any Hadoop component using shell script. HCV is a set of artifacts developed to verify successful implementation of any Hadoop Component. HCV provides feedback of successfully component installation/upgrade and can be unitized during New Cluster deployment, Cluster Upgrade and in case applying major or minor patch to existing Hadoop cluster. This process helps reduce risk before delivering/handover cluster to client/users and surface problems early. The goal is to completely verify system capability to meet all requirements prior to production/operation/development stages.
How it works?

1. Copy bundled scripts to one of Hadoop cluster node and unzip it.

2. Modify config.txt for required parameters to test. By default its Null means no components test.

3. Change permission to executable to all shell scripts.

4. Run

5. Check output in summary output in output.txt and details output in out.txt.

How to use?

1. Hadoop verification scripts can run directly on any node where hadoop components are isntalled.

2. Check config.txt file for required componets to be tested.

3. Once permissions i.e. “chmod 777 *.sh” set run “./ 2> out.txt”

4. Verify summary result of verification scripts in output.txt.

5. Verify detail output in case of any error in out.txt.


HCV Demo

Related Posts

  • HAWQ/HDB and Hadoop with Hive and HBaseHAWQ/HDB and Hadoop with Hive and HBase

    Hive: Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. HBase: Apache HBase™ is the Hadoop database, a distributed, scalable, big…

  • Real Time Data Ingestion (DiP) – Spark Streaming (co-dev opportunity)Real Time Data Ingestion (DiP) – Spark Streaming (co-dev opportunity)

    This blog is an extension to that and it focuses on integrating Spark Streaming to Data Ingestion Platform for performing real time data ingestion and visualization. The previous blog DiP (Storm Streaming) showed how…

  • Content Data Store

    Content Data Store Content Data Store (CDS) is a system to provide storage facilities to massive data sets in the form of images, pdfs, documents and scanned documents. This dataset…

  • Introduction to Messaging

    Messaging is one of the most important aspects of modern programming techniques. Majority of today's systems consist of several modules and external dependencies. If they weren't able to communicate with…

  • Real Time Data Ingestion (DiP) – Apache Apex (co-dev opportunity)Real Time Data Ingestion (DiP) – Apache Apex (co-dev opportunity)

    Data Ingestion Platform This work is based on Xavient co-dev initiative where your engineers can start working with our team to contribute and build your own platform to ingest any…

  • Bye Bye MapReduceBye Bye MapReduce

    RIP MAPREDUCE! Wait! What? Really! In the era of Big Data how you can say “bye bye MapReduce”? We can, because Informatica just did the same. MapReduce is a framework…

Leave a Reply

Your email address will not be published. Required fields are marked *