Post3

DiP (Storm,Spark,Flink and Apex) Co-Dev opportunity

Posted by

Real time data ingestion using Data Ingestion Platform (DiP) which harness the powers of Apache Apex, Apache Flink, Apache Spark and Apache Storm to give real time data ingestion and visualization.

DiP comes along with a UI which allows to switch between multiple data streaming engines and combines them under one single platform.

DiP Features

  • Multiple Sources
  • Multiple File Formats
  • Easy to use UI
  • Data Visualization
  • High Level API’s
  • Java, Scala , Client bindings

DiP Technology Stack

  • Source System – Web Client
  • Messaging System – Apache Kafka
  • Target System – HDFS, Apache HBase, Apache Hive
  • Reporting System – Apache Phoenix, Apache Zeppelin
  • Streaming API’s – Apache Apex, Apache Flink, Apache Spark and Apache Storm
  • Programming Language – Java
  • IDE – Eclipse
  • Build tool – Apache Maven
  • Operating System – CentOS 7

DiP Architecture

The DiP architecture has four blocks in the middle layer one for each streaming engine namely Apex, Flink , Spark Streaming and Storm respectively.

Picture1

Dip UI

dipui

DiP comes with an easy to use UI that offers the following features –

  • Switch easily between the supported streaming engines just by clicking on a radio button.
  • Supports xml, json and tsv data formats
  • Use text area to enter data manually for getting processed
  • Process files for batch processing by simply uploading them

DiP on Apex

Apache Apex is an enterprise grade native YARN big data-in-motion platform that unifies stream processing as well as batch processing. It processes big data in-motion in a highly scalable, highly performant, fault tolerant, stateful, secure, distributed, and an easily operable way.

Blog link

GitHub link

High-Level Process Workflow

processflow

DiP on Flink

Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.

Blog link

GitHub link 

High-Level Process Workflow

processs

DiP on Spark Streaming

Spark Streaming is an extension of the core Spark API that enables scale able, high-throughput, fault-tolerant stream processing of live data streams.
Blog link

GitHub link

High-Level Process Workflow

sparsarch

DiP on Storm

Apache Storm is a free and open source distributed real time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

Blog link

GitHub link

High-Level Process Workflow

stormss

Credits

Xavient Information Systems

 

Technical team

Neeraj Sabharwal

Mohiuddin Khan Inamdar

Gautam Marya

Puneet Singh

Sumit Chauhan

Demo request :

email: nsabharwal@xavient.com

Dip and Storm Demo

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *