This blog is an extension to that and it focuses on using Flink Streaming for performing real time data ingestion.
The previous blog DiP (Storm Streaming) showed how we can leverage the power of Apache Storm and Kafka to do real time data ingestion and visualization.
DiP currently supports three more data streaming engines using Apache Storm , Apache Spark and Apache Apex.
This work is based on Xavient co-dev initiative where your engineers can start working with our team to contribute and build your own platform to ingest any kind of data in real time.
All you need is a running Hadoop cluster with Kafka, Flink, Hive ,HBase and Zeppelin. You can deploy the application on the top of your existing cluster and ingest any kind of data.
You can download the code base from GitHub
Flink Streaming Features
One runtime for Streaming and Batch Processing
- Java, Scala, client bindings
- Declarative API
- Very High Throughput
- Own memory management inside the JVM
- Growing community support
Technology Stack
- Source System– Web Client
- Messaging System– Apache Kafka
- Target System– HDFS, Apache HBase, Apache Hive
- Reporting System– Apache Phoenix, Apache Zeppelin
- Streaming API– Apache Flink
- Programming Language– Java
- IDE– Eclipse
- Build tool– Apache Maven
- Operating System– CentOS 7
High Level Process Workflow with Flink-Streaming
- Input to the application can be fed from a user interface that allows you either enter the data manually or upload the data in XML, JSON or CSV file format for bulk processing
- Data ingested is published by the Kafka broker which streams the data to Kafka consumer process
- Once the message type is identified, the content of the message is extracted from the kafka source and is sent to different sinks for its persistence
- Hive external table provides data storage through HDFS and Phoenix provides an SQL interface for HBase tables
- Reporting and visualization of data is done through Zeppelin
DiP Front End
Flink Execution Flow
The job submitted to flink will look like this:
Open the UI for the application by visiting the URL “http://:/DataIngestGUI/UI.jsp” , it will look like this:
DiP Data Visualization
Using Apache Zeppelin, data ingested in HBase can be viewed as a report/graphs by simply using phoenix interpreter which provides SQL like interface to HBase table. These graphs can be embedded to any other applications using JFrames.
credits :
Technical team:
Related Posts
Real Time Data Ingestion (DiP) – Spark Streaming (co-dev opportunity)
This blog is an extension to that and it focuses on integrating Spark Streaming to Data Ingestion Platform for performing real time data ingestion and visualization. The previous blog DiP (Storm Streaming) showed how…
Real Time Data Ingestion (DiP) – Apache Apex (co-dev opportunity)
Data Ingestion Platform This work is based on Xavient co-dev initiative where your engineers can start working with our team to contribute and build your own platform to ingest any…
DiP (Storm,Spark,Flink and Apex) Co-Dev opportunity
Real time data ingestion using Data Ingestion Platform (DiP) which harness the powers of Apache Apex, Apache Flink, Apache Spark and Apache Storm to give real time data ingestion and visualization. DiP…
Content Data Store
Content Data Store Content Data Store (CDS) is a system to provide storage facilities to massive data sets in the form of images, pdfs, documents and scanned documents. This dataset…
KAFKA-Druid Integration with Ingestion DIP Real Time Data
The following blog explains how we can leverage the power of Druid to ingest the DIP data into Druid (a high performance, column oriented, distributed data store), via Kafka Tranquility…
Real time data ingestion – Easy and Simple (co-dev opportunity)
This work is based on Xavient co-dev initiative where your engineers can start working with our team to contribute and build your own platform to ingest any kind of data…
Real time data ingestion from sources
Oralce, MS Sql, Vertica.
Please suggest what could be best solution
Thanks
Your article constantly have plenty of of really present inrfrmation.Wheoe do you come up with this? Just declaring you have been very creative.Thanks again
Thanks for your personal marvelous posting! I definitely enjoyed reading it, you will be a great author.I will make sure to bookmark your blog and will come back in the foreseeable future. I want to encourage you to ultimately continue your great job, have a nice afternoon!
Simply desire to say your article is as amazing. The clarity in your submit is just excellent and i can suppose you’re an expert on this subject. Fine together with your permission allow me to take hold of your RSS feed to stay up to date with imminent post. Thanks 1,000,000 and please continue the gratifying work.
After reading this article i don’t have knowledge about flink stream processing but now i am getting more useful information about flink from this article
big data tutorials
Great and helpful blog to everyone.. Before reading this blog i have dont have a proper idea about hadoop but now i am very strong in topic which really helpful to update my knowledge of big data.. thanks a lot for sharing this blog to us..
hadoop training