Interactive Queries: A New Feature For Stream Processing In Apache Kafka

Apache Kafka

5 MIN READ

December 10, 2021

Apache Kafka Stream Processing

Apache Kafka is the most popular language of recent times. It has been going through many upgrades and enhancements to provide better performance. This article will be discussing one such additional feature. Confluent has announced a new feature called Interactive Queries for stream processing with Apache Kafka. It allows you to treat the stream processing layer as lightweight and directly query the stream processing engine. Apache kafka manages it and offer fault tolerance. This new feature enables the confluence of processing and storage into one single easy-to-use application.

This vision is moving stream processing out of big data and making it available as a mainstream application development model. This blog will focus on the motivation behind these Interactive Queries through various examples.

Example: Real-time risk management

In this example, consider a financial institution, for example a  wealth management firm that maintains positions in assets possessed by the firm and its investors. The bank continuously collects business events and data that could influence the risk which is associated with these positions. Whenever the data changes, the risk positions are recalculated in order to keep a real-time view.

Real-time risk management is an example of stateful application. A state is required to keep a track of the latest positions, and is also required inside the stream processing to keep track of statistics. All collected states need to be upgraded and queried continuously.

How is this done?

In the risk management dashboard, business events would be captured as real-time data in Apache Kafka Streams. There are lot of moving parts and inefficiency in how things are done-

  • An extra Hadoop cluster to reprocess data. 
  • Storage is maintained at the stream processing layer
  • Storage and databases maintained from the streaming and Hadoop jobs.
  • A record is written internally to maintain the computational state. This state is later duplicated.
  • Locality is destroyed as the data that needs to be local is unnecessarily shipped to a storage cluster.

Case of interactive Queries

Let us now simplify the above development by just removing the Hadoop layer and then having all the process done in the streaming layer. For this we move lambda architecture to the Kappa architecture. Here we did even better by using Interactive Queries. With the help of Interactive Queries we are directly exposing this embedded state to applications. The embedded databases act as materialized views of logs and stored in Apache Kafka. 

Materialized views provide better application isolation and better performance.

Selecting right database

Points to consider when selecting the right database and storage-

Pros of Interactive Queries with embedded databases:

  • Very few moving points and you don’t have to deploy, maintain and operate an external database.
  • It allows faster and efficient use of application state.
  • It provides better isolation.
  • It allows more flexibility.

Cons:

  • You may have to move away from a database that you trust.
  • You may have to scale storage independently.
  • You will need customized queries which are specific to some database.

Whatever you choose, just remember that you get more flexibility with Apache Kafka. 

Information queried interactively

Interactive Queries enables developers to query embedded state stores of a streaming app.  These are read-only and no modifications are allowed. This is to avoid state inconsistencies. Allowing read-only access is sufficient for most of the applications that consume data from a queryable streaming application. 

  • Interactive Queries enable faster and efficient use of applications.
  • There is no duplication of data. 

How to make Apache Kafka streams applications queryable?

Apache Kafka streams handles low-level querying and offers fault tolerance, and thus you can query with zero work.

Querying local stores

  • Start with a single app’s instance.
  • Kafka stream will partition up the data among the instances.

Discovering any instances’ stores

  • We need to make each instance aware of the other one through periodic metadata exchange.
  • With Apache Kafka Streams each instance many expose its endpoint information metadata to other instances.
  • New Interactive Query API allows a developer to obtain metadata.
  • Now you can discover where the store is.

Conclusion

It doesn’t matter whether you are creating a core banking application or advertising any data pipeline, there will be a requirement to scale processing and make it real-time. To perform this, you need a Database that can do both. Apache Kafka provides you with the power of declarative API. Interactive Queries allows you to query data as it is being processed.

If you are looking for Apache Kafka services, Ksolves’ is the best choice for building your own real-time applications. We are one of the best Apache Kafka service providers across the globe offering customized Apache Kafka services with minimum latency. Write in the comment section for more details. 

 

Contact Us for any Query

Email : sales@ksolves.com

Call : +91 8130704295

authore image
Shilpa Shrivastava
AUTHOR

Leave a Comment

Your email address will not be published. Required fields are marked *

(Text Character Limit 350)