Exactly-Once Processing in Apache Flink – Understand How It Works

Apache Flink

5 MIN READ

July 2, 2026

Loading

exactly-once processing in apache flink
Exactly-once processing is the gold standard for real-time data pipelines but most engineers misunderstand what it actually guarantees. This post unpacks Apache Flink's internal architecture for achieving exactly-once semantic state processing, covering the distributed Chandy-Lamport checkpointing protocol, checkpoint barrier alignment, and the Two-Phase Commit (2PC) sink interface. Whether you run financial ledgers, inventory systems, or automated billing engines, this guide gives you the engineering blueprint, production code, and configuration checklist to build pipelines that are mathematically correct across failures.

When building real-time data pipelines, data correctness is non-negotiable. A node crash, a network partition, or a transient upstream cloud blip should never result in a corrupted ledger, a duplicated billing charge, or an inventory count that is off by one. Yet in distributed streaming systems, these failure scenarios are not edge cases. They are routine operational realities that every production pipeline must be designed to survive. This is the problem that Exactly-Once Processing is built to solve.

The phrase gets used loosely across the data engineering industry. Vendors market it. Engineers assume it is a single toggle that, once enabled, makes a pipeline correct. In practice, the gap between claiming exactly-once support and actually delivering it is architectural, not cosmetic. Apache Flink closes that gap through two interlocking mechanisms: a distributed snapshot protocol based on the Chandy-Lamport algorithm, and a Two-Phase Commit sink interface that extends correctness guarantees all the way to your external systems.

What Is Exactly-Once Processing?

Dispelling the Myth: “Exactly-Once Delivery”

Before examining Flink’s internals, there is a misconception worth addressing head-on. Exactly-once processing does not mean an event is transmitted across the network exactly once.

This distinction matters more than it might appear. Over a distributed network, guaranteeing physical single-delivery during a failure is mathematically impossible. When a worker node crashes after receiving a message but before sending an acknowledgement, the upstream broker has no way to confirm delivery. Its only safe course of action is to retransmit. Duplicates at the transport layer are not a bug. They are an unavoidable property of how reliable distributed messaging works.

What Flink guarantees is something fundamentally different: exactly-once semantic processing of state. When a duplicate event arrives due to a retry or a replay, Flink’s internal tracking ensures the operator’s state store is updated exactly once. The duplicate is absorbed cleanly, without corrupting your counters, window buckets, or transactional database tables. Your business logic sees one logical event, regardless of how many times the network delivered it.

This distinction is the foundation of everything else in Flink’s exactly-once architecture.

Get Exactly-Once Right In Production

The Engineering Trade-offs: Latency, Cost, and Consistency

Enforcing strict correctness across distributed state introduces real infrastructure costs. Understanding these trade-offs upfront prevents misconfiguration in production. If you have not yet committed to Flink, our comparison of Kafka Streams vs Spark Streaming is a useful starting point.

At-Least-Once: Optimise for Speed

If your pipeline tolerates minor inaccuracies, the at-least-once profile eliminates the overhead of global state synchronisation. Operators process events immediately without waiting for barrier alignment. Latency drops to sub-millisecond ranges.

The catch: during a worker failure, some events will be reprocessed and the state will reflect duplicates. For non-critical logging, search indexing, or real-time dashboards where approximation is acceptable, this is a reasonable trade.

Exactly-Once: Optimise for Correctness

For mission-critical pipelines, financial reconciliation, inventory control, and billing engines, exactly once is non-negotiable. Flink guarantees 100% correct state across infrastructure restarts.

The cost is an “infrastructure tax”: a minor latency increase tied to checkpoint barrier coordination, plus a requirement that your external sinks support either idempotent writes or transactional two-phase commit protocols.

Consistency Metrics Matrix

Engineering Metric At-Least-Once Configuration Exactly-Once Configuration (End-to-End)
Data Correctness Dupes are possible during failures. 100% mathematically correct state.
End-to-End Latency Ultra-low (sub-millisecond). Higher. Linked to your checkpoint interval and transactional commit speeds.
Sink Storage Dependency Any database/file system. Requires idempotent targets or transactional systems (e.g., Kafka / PostgreSQL).
Operational Overheads Low. Minimal tuning required. Medium to high. Requires active state size management and transaction timeout configurations.

Choosing the Right Profile for Your SLA

The right choice depends on your SLA and the downstream consequences of a duplicate event. Many production architectures use at-least-once for edge ingestion and cleansing zones, then enforce exactly-once at the core aggregation and reporting layer. This tiered approach captures most of the throughput benefit of at-least-once while ensuring correctness where it actually matters.

The Core Engine: Distributed Chandy-Lamport Checkpointing

Flink’s exactly-once guarantee at the state level is built on a variant of the Chandy-Lamport distributed snapshot algorithm. The key design principle: Flink never stops the pipeline to take a backup. Instead, it injects lightweight metadata markers called Checkpoint Barriers directly into the event stream.

root--flinkbarrierdfd

Figure : Flink Checkpoint Barrier Alignment. Operators receive data from multiple upstream channels. When a checkpoint barrier arrives on one channel, the operator pauses consumption on that channel and waits for the matching barrier from other channels (Alignment). This ensures the snapshot captures state exactly at the barrier boundary before flushing data off-heap.

Talk To Our Flink Engineers

Inside Checkpoint Barrier Alignment: A Step-by-Step Breakdown

  1. Barrier Injection. The JobManager’s Checkpoint Coordinator injects a lightweight barrier marker into every parallel input stream at scheduled intervals. The barrier travels downstream alongside live data records, invisible to business logic but visible to every operator in the execution graph.
  2. Asynchronous Alignment. When a downstream operator receives a barrier on one of its input channels, it pauses consumption on that channel and waits for the matching barrier to arrive from all other parallel input channels. This alignment step is the correctness guarantee: it creates a clean boundary that prevents pre-barrier events from mixing with post-barrier state updates, which would otherwise corrupt the snapshot.
  3. Off-Heap Snapshotting. Once all barriers align, the operator serialises its current state into a local copy and ships it asynchronously to durable remote storage such as Amazon S3 or HDFS. The snapshot captures state at the exact barrier boundary, nothing before it and nothing after.
  4. Resuming Processing. The operator resumes live event processing immediately after triggering the snapshot upload. It does not block on the remote write. Persistence happens in the background, keeping pipeline latency low while durability is guaranteed in parallel.

Achieving End-to-End Consistency: Two-Phase Commit (2PC)

Guaranteeing exactly-once state updates inside Flink solves only half the problem. If a failure occurs after Flink snapshots its internal state but before it finishes writing to an external sink, the destination database still ends up with duplicated rows. Internal correctness alone cannot prevent this.

To close that gap, external output sinks must participate directly in Flink’s checkpoint lifecycle through the TwoPhaseCommitSinkFunction. Data is held in an uncommitted transaction buffer and only atomically committed once Flink confirms a successful global checkpoint. If anything fails before that confirmation, the transaction rolls back cleanly. Without this integration, data leakage at the sink boundary is unavoidable.

root--flink2phasedfd

Production Blueprint: End-to-End Exactly-Once with Kafka

The following code configures a complete end-to-end exactly-once pipeline using Kafka as both source and sink.

import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.connector.kafka.sink.KafkaRecordSerializationSchema;
import org.apache.flink.connector.kafka.sink.KafkaSink;
import org.apache.flink.connector.kafka.sink.DeliverGuarantee;
import org.apache.flink.connector.kafka.source.KafkaSource;
import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.CheckpointingMode;

public class EndToEndExactlyOncePipeline {
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // 1. Enforce strict internal Exactly-Once Checkpointing
        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
        env.getCheckpointConfig().setCheckpointInterval(60000); // 60 seconds

        // 2. Build Exactly-Once Consumer Source
        KafkaSource<String> source = KafkaSource.<String>builder()
            .setBootstrapServers("localhost:9092")
            .setTopics("input-events")
            .setGroupId("flink-consistency-group")
            .setStartingOffsets(OffsetsInitializer.committedOffsets())
            .setValueOnlyDeserializer(new SimpleStringSchema())
            .build();

        // 3. Build Transactional Two-Phase Commit Sink
        KafkaSink<String> transactionalSink = KafkaSink.<String>builder()
            .setBootstrapServers("localhost:9092")
            .setRecordSerializer(
                KafkaRecordSerializationSchema.builder()
                    .setTopic("output-analytics")
                    .setValueSerializationSchema(new SimpleStringSchema())
                    .build()
            )
            .setDeliverGuarantee(DeliverGuarantee.EXACTLY_ONCE) // Activates Flink's 2PC Sink Engine
            .setTransactionalIdPrefix("flink-tx-sensor-metrics-")
            .build();

        // 4. Bind Pipeline Execution Graph
        env.fromSource(source, org.apache.flink.api.common.eventtime.WatermarkStrategy.noWatermarks(), "KafkaSource")
            .map(value -> value.toUpperCase()) // Stateless operational transformation
            .sinkTo(transactionalSink);

        env.execute("EndToEndExactlyOncePipeline");
    }
}

Three configuration decisions in this blueprint are doing the heavy lifting:

  • CheckpointingMode.EXACTLY_ONCE activates barrier-based distributed snapshotting inside the Flink runtime.
  • DeliverGuarantee.EXACTLY_ONCE on the Kafka sink activates the TwoPhaseCommitSinkFunction, wrapping all writes in Kafka transactions.
  • setTransactionalIdPrefix gives each Flink task a stable transaction identity, allowing Kafka to identify and fence zombie transactions after a failure.
Ship Correct Real-Time Pipelines

Best Practices and Pitfalls for Exactly-Once Configuration

Getting exactly-once processing right requires more than enabling the right flags. Missing even one of these configuration dependencies can cause silent data duplication or unexpected pipeline downtime.

What Should You Consider?

  • Align transaction timeouts with checkpoint intervals. Your external broker’s transaction timeout must always exceed your Flink checkpoint interval. For Kafka, if transaction.max.timeout.ms defaults to 15 minutes; your checkpoint interval must stay comfortably within that window. Breaching it produces orphaned transactions that Kafka expires before Flink gets the chance to commit them. Teams running Kafka in production often lean on dedicated Kafka support to keep these broker and transaction settings correctly tuned.
  • Set consumer isolation to read committed. Any downstream application consuming from a Flink Two-Phase Commit sink must configure isolation.level=read_committed in Kafka. Without this setting, consumers read uncommitted records that may still be rolled back on failure, quietly invalidating the guarantees your transactional pipeline was built to provide.
  • Control checkpoint concurrency with a minimum pause. Guard against checkpoint overlap by setting env.getCheckpointConfig().setMinPauseBetweenCheckpoints(30000). This ensures one checkpoint fully completes before the next is triggered, preventing the system from stalling under heavy I/O workloads.

What You Should Avoid?

  • Avoid non-transactional sinks on critical paths. Standard file systems and REST endpoints that lack rollback support cannot participate in the Two-Phase Commit protocol. Any partial write during a failover cannot be undone, silently reverting your pipeline to at-least-once behaviour regardless of how Flink is configured internally.
  • Use unaligned checkpoints with care. Unaligned checkpoints (state.backend.changelog.enabled: true) reduce latency under backpressure but increase the network recovery footprint during a failover. Enable them only when low latency is the priority and the added recovery cost is fully understood.

Conclusion

Architecting reliable real-time pipelines requires choosing the right balance between processing speed and state correctness. For edge cleansing, filtering, and raw ingestion zones where throughput is the priority, at-least-once transformations are a practical and efficient choice. For core aggregation, stateful accounting, and transactional metrics reporting, Flink’s Chandy-Lamport checkpointing alongside transactional 2PC sinks is non-negotiable.

True end-to-end exactly-once processing is not a passive setting. It is an explicit architectural contract between your data source, Flink’s internal state engine, and your downstream output sinks. Getting it right eliminates the need for manual deduplication logic and builds a resilient foundation that holds up under real-world failure conditions.

This is exactly where Apache Flink enterprise support proves its value — the difference between a pipeline that works in a demo and one that survives production failure at scale. At Ksolves, we help engineering teams design and implement production-grade pipelines with exactly-once guarantees built in from the ground up. If you’re building a real-time data platform that cannot afford correctness gaps, our team is ready to help you get it right.

AUTHOR

author image
Anil Kushwaha

Apache Flink

Anil Kushwaha, Technology Head at Ksolves, is an expert in Big Data. With over 11 years at Ksolves, he has been pivotal in driving innovative, high-volume data solutions with technologies like Nifi, Cassandra, Spark, Hadoop, etc. Passionate about advancing tech, he ensures smooth data warehousing for client success through tailored, cutting-edge strategies.

Leave a Comment

Your email address will not be published. Required fields are marked *

(Text Character Limit 350)

Frequently Asked Questions

What is the difference between exactly-once and at-least-once processing in Apache Flink?

Exactly-once processing in Apache Flink guarantees that each event updates the operator state a single time, even when failures cause the event to be redelivered. At-least-once processing is faster but allows duplicate state updates during a worker failure. The practical difference is correctness versus latency: exactly-once suits financial, billing, and inventory workloads, while at-least-once suits logging, search indexing, and dashboards where minor duplication is acceptable.

How does Apache Flink guarantee exactly-once processing?

Apache Flink guarantees exactly-once processing through two mechanisms working together. Internally, a Chandy-Lamport distributed snapshot protocol injects checkpoint barriers into the stream and aligns them so state is captured at a clean boundary. Externally, a TwoPhaseCommitSinkFunction holds writes in an uncommitted transaction until a global checkpoint succeeds. Ksolves designs Flink pipelines that combine both layers so correctness extends end-to-end.

Does enabling exactly-once processing increase latency in Flink?

Yes, exactly-once processing adds a modest latency overhead compared with at-least-once. The cost comes from checkpoint barrier coordination and transactional commit timing on the sink. Many production architectures reduce the impact by using at-least-once for edge ingestion and cleansing, then enforcing exactly-once only at the core aggregation and reporting layer where correctness matters most.

What is the TwoPhaseCommitSinkFunction in Apache Flink?

The TwoPhaseCommitSinkFunction is Flink’s interface that lets an external sink participate in the checkpoint lifecycle. Data is written into an uncommitted transaction buffer and only committed atomically once Flink confirms a successful global checkpoint. If a failure occurs before that confirmation, the transaction rolls back cleanly, which prevents duplicated rows in the destination system.

Which external systems support end-to-end exactly-once with Flink?

End-to-end exactly-once requires sinks that support either idempotent writes or transactional two-phase commit. Transactional systems such as Apache Kafka and PostgreSQL work well because they can buffer and atomically commit writes. Plain file systems and standard REST endpoints cannot roll back partial writes, so they silently degrade a pipeline back to at-least-once behaviour regardless of Flink’s internal configuration.

Why must the Kafka transaction timeout exceed the Flink checkpoint interval?

The Kafka transaction timeout must exceed the Flink checkpoint interval so that open transactions survive long enough to be committed on the next successful checkpoint. If the checkpoint interval is longer than the broker’s transaction.max.timeout.ms, Kafka expires the transaction before Flink commits it, producing orphaned transactions and broken guarantees. Consumers should also set isolation.level=read_committed to avoid reading uncommitted records.

How can teams implement production-grade exactly-once Flink pipelines?

Teams implement production-grade exactly-once pipelines by enabling checkpointing in EXACTLY_ONCE mode, using transactional sinks, aligning transaction timeouts with checkpoint intervals, and setting consumer isolation to read committed. Ksolves helps engineering teams design and deploy these Apache Flink pipelines with exactly-once guarantees built in, tuning checkpointing, state management, and Kafka transaction settings for real-world failure conditions.

Still have questions about exactly-once Flink pipelines? Contact our team.

Copyright 2026© Ksolves.com | All Rights Reserved
Ksolves USP