Apache Flink – Batch vs Real-time Processing
In terms of Big Data, there are two types of processing −
- Batch Processing
- Real-time Processing
Processing based on the data collected over time is called Batch Processing. For example, a bank manager wants to process past one-month data (collected over time) to know the number of cheques that got cancelled in the past 1 month.
Processing based on immediate data for instant result is called Real-time Processing. For example, a bank manager getting a fraud alert immediately after a fraud transaction (instant result) has occurred.
The table given below lists down the differences between Batch and Real-Time Processing −
Batch Processing | Real-Time Processing |
---|---|
Static Files |
Event Streams |
Processed Periodically in minute, |
Processed immediately nanoseconds |
Past data on disk storage |
In Memory Storage |
Example − Bill Generation |
Example − ATM Transaction Alert |
These days, real-time processing is being used a lot in every organization. Use cases like fraud detection, real-time alerts in healthcare and network attack alert require real-time processing of instant data; a delay of even few milliseconds can have a huge impact.
An ideal tool for such real time use cases would be the one, which can input data as stream and not batch. Apache Flink is that real-time processing tool.