June 30, 2025

min read

Checkpoint Chronicle - June 2025

Share this post

Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your editor-in-chief for this edition is Hans-Peter Grahsl. Feel free to send me any choice nuggets that you think we should feature in future editions.

Stream Processing, Streaming SQL, and Streaming Databases

If you’ve always been curious what watermarks are in Apache Flink, here is your chance to learn about them in almost no time by reading Robin Moffatt’s article.‍
Flink SQL Runner is a curated set of tools and extensions to help run Apache Flink SQL applications on top of the Flink Kubernetes Operator. ‍
Tributary is a DuckDB extension, addressing data engineers and analysts alike, which provides a seamless integration between Apache Kafka and DuckDB for real-time querying and analysis of streaming data using SQL.

Event Streaming

Strimzi is not only a well-known and widely deployed project in the Kafka space, it even has its own virtual conference. In case you missed StrimziCon 2025, here is a quick session recap with links to all recordings and slides.‍
Kroxylicious, the snappy open source proxy for Apache Kafka, recently shipped version 0.13.0 which includes an operator to run the proxy on Kubernetes.
Apache Avro is a popular serialization format for Kafka records. To remove some friction when working with Avro data in the context of CLI tooling Dale Lane open sourced kafka-avro-formatters.
Even though Apache Kafka is around for several years, there is still demand for beginners’ content. Here is one such article recently written by Vu Trinh.
In “Kafka: The End of the Beginning”, Chris Riccomini reflects on the last decade in the event streaming and stream processing spaces. Unsurprisingly, there are a few spicy takes in there which might justify bringing popcorn ;-)

Data Ecosystem

MLflow 3 - released in early June - extends MLflow’s foundation to address the challenging requirements of generative AI workloads, in particular how to measure and ensure quality and stability.‍
Apache Iceberg enthusiasts have been eagerly awaiting this moment… the ratification of the v3 table spec. Read more in this blog article by Danica Fine and Kevin Liu who briefly walk you through the main features and share what this community-driven effort means going forward.‍
Fluss - the streaming storage for real-time analytics - officially released version 0.7 lately. Besides the announcement post there is a webinar recording to dive deeper into all its new features and improvements.

Data Platforms and Architecture

Kumudini Kakwani et al. published an article explaining Uber’s migration journey from Hive to Apache Spark SQL. Besides some impressive numbers, it contains several helpful insights into how they successfully tackled various difficult challenges along the way.
“Model Once, Represent Everywhere: UDA (Unified Data Architecture)” written by Alex Hutter et al. details how Netflix automatically transpiles their domain models into consistent schemas to preserve integrity and interoperability across federated data systems.

Databases and Change Data Capture

In “The Art of SQL Query Optimization” Jan Nidzwetzki introduces Plan Explorer, a tool which provides valuable insights into the workings of Postgres query optimizations.
Bohan Zhang’s PGConf.dev 2025 talk “OpenAI: Scaling PostgreSQL to the Next Level” discusses vital aspects and interesting techniques to achieve their reliability and scalability needs for critical workloads.
Niko Matsakis and Marc Bowes offer insights into Amazon’s DSQL development and why Rust turned out to be a great fit for them. Read the details in “Just make it scale: An Aurora DSQL story”.
‍CockroachDB includes changefeeds as a native database feature. Rohan Joshi and Miles Frankel wrote “Enriched Changefeeds: Debezium Simplicity, CockroachDB Resilience” to explain why they decided to adopt Debezium’s change event stream format.
Snyk created Skemium, an open source tool which helps to detect breaking schema changes of CDC events as early as possible. By comparing between evolutions of the originating database schema, it identifies compatibility issues when executing the schema comparison logic implemented by the schema registry.
Fiore Mario Vitale discusses how Debezium natively integrates with OpenLineage to help answer critical data lineage related questions. The article also touches upon Marquez as an example of how to process and work with lineage data in the context of CDC.

Paper of the Month

The research paper by Alexander Behm et al. describes Photon, a vectorized query engine built for Lakehouse systems. Photon is implemented in C++ and tightly integrates with Apache Spark APIs to support both SQL and DataFrame-based workloads. It tackles two core challenges: performance over raw, uncurated datasets and semantic compatibility. Photon delivers average query speedups of ~3x (up to 10x) compared to legacy Spark runtimes, and enabled a 100 TB TPC‑DS world record on a Delta‑Lake/S3 Lakehouse.

Events & Call for Papers (CfP)

BEAM Summit 2025 (New York City, USA) July 8-9
JavaZone 2025 (Lillestrøm, Norway) September 4-5
BigDataLdn 2025 (London, United Kingdom) September 24-25
Devoxx (Antwerp, Belgium) October 6-10, CfP open
Flink Forward 2025 (Barcelona, Spain) October 13-16
Current 2025 (New Orleans, LA, USA) October 29-30

New Releases

Apache Flink Kubernetes Operator 1.12.0
Debezium 3.1.3.Final and 3.2.0.CR1
Apache Iceberg 1.9.1
Apache Pulsar 4.0.5, 3.3.7, and 3.0.12
Fluss 0.7
Strimzi 0.46.1
Kroxylicious 0.13.0

—

That’s all for this month! We hope you’ve enjoyed the newsletter and would love to hear about any feedback or suggestions you’ve got.

Hans-Peter (LinkedIn / Bluesky / X / Mastodon / Email)

‍

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!

Oops! Something went wrong while submitting the form.

Hans-Peter Grahsl

Hans-Peter Grahsl is a Staff Developer Advocate at Decodable. He is an open-source community enthusiast and in particular passionate about event-driven architectures, distributed stream processing systems and data engineering. For his code contributions, conference talks and blog post writing at the intersection of the Apache Kafka and MongoDB communities, Hans-Peter received multiple community awards. He likes to code and is a regular speaker at developer conferences around the world.

May 28, 2025

min read

Powered by Apache Flink and Debezium, Decodable is a real-time data platform that unifies ELT, ETL, and stream processing.

Start FREE Talk to an Expert

Heading 2

Stream Processing, Streaming SQL, and Streaming Databases

If you’ve always been curious what watermarks are in Apache Flink, here is your chance to learn about them in almost no time by reading Robin Moffatt’s article.‍
Flink SQL Runner is a curated set of tools and extensions to help run Apache Flink SQL applications on top of the Flink Kubernetes Operator. ‍
Tributary is a DuckDB extension, addressing data engineers and analysts alike, which provides a seamless integration between Apache Kafka and DuckDB for real-time querying and analysis of streaming data using SQL.

Event Streaming

Strimzi is not only a well-known and widely deployed project in the Kafka space, it even has its own virtual conference. In case you missed StrimziCon 2025, here is a quick session recap with links to all recordings and slides.‍
Kroxylicious, the snappy open source proxy for Apache Kafka, recently shipped version 0.13.0 which includes an operator to run the proxy on Kubernetes.
Apache Avro is a popular serialization format for Kafka records. To remove some friction when working with Avro data in the context of CLI tooling Dale Lane open sourced kafka-avro-formatters.
Even though Apache Kafka is around for several years, there is still demand for beginners’ content. Here is one such article recently written by Vu Trinh.
In “Kafka: The End of the Beginning”, Chris Riccomini reflects on the last decade in the event streaming and stream processing spaces. Unsurprisingly, there are a few spicy takes in there which might justify bringing popcorn ;-)

Data Ecosystem

MLflow 3 - released in early June - extends MLflow’s foundation to address the challenging requirements of generative AI workloads, in particular how to measure and ensure quality and stability.‍
Apache Iceberg enthusiasts have been eagerly awaiting this moment… the ratification of the v3 table spec. Read more in this blog article by Danica Fine and Kevin Liu who briefly walk you through the main features and share what this community-driven effort means going forward.‍
Fluss - the streaming storage for real-time analytics - officially released version 0.7 lately. Besides the announcement post there is a webinar recording to dive deeper into all its new features and improvements.

Data Platforms and Architecture

Kumudini Kakwani et al. published an article explaining Uber’s migration journey from Hive to Apache Spark SQL. Besides some impressive numbers, it contains several helpful insights into how they successfully tackled various difficult challenges along the way.
“Model Once, Represent Everywhere: UDA (Unified Data Architecture)” written by Alex Hutter et al. details how Netflix automatically transpiles their domain models into consistent schemas to preserve integrity and interoperability across federated data systems.

Databases and Change Data Capture

In “The Art of SQL Query Optimization” Jan Nidzwetzki introduces Plan Explorer, a tool which provides valuable insights into the workings of Postgres query optimizations.
Bohan Zhang’s PGConf.dev 2025 talk “OpenAI: Scaling PostgreSQL to the Next Level” discusses vital aspects and interesting techniques to achieve their reliability and scalability needs for critical workloads.
Niko Matsakis and Marc Bowes offer insights into Amazon’s DSQL development and why Rust turned out to be a great fit for them. Read the details in “Just make it scale: An Aurora DSQL story”.
‍CockroachDB includes changefeeds as a native database feature. Rohan Joshi and Miles Frankel wrote “Enriched Changefeeds: Debezium Simplicity, CockroachDB Resilience” to explain why they decided to adopt Debezium’s change event stream format.
Snyk created Skemium, an open source tool which helps to detect breaking schema changes of CDC events as early as possible. By comparing between evolutions of the originating database schema, it identifies compatibility issues when executing the schema comparison logic implemented by the schema registry.
Fiore Mario Vitale discusses how Debezium natively integrates with OpenLineage to help answer critical data lineage related questions. The article also touches upon Marquez as an example of how to process and work with lineage data in the context of CDC.

Paper of the Month

Events & Call for Papers (CfP)

BEAM Summit 2025 (New York City, USA) July 8-9
JavaZone 2025 (Lillestrøm, Norway) September 4-5
BigDataLdn 2025 (London, United Kingdom) September 24-25
Devoxx (Antwerp, Belgium) October 6-10, CfP open
Flink Forward 2025 (Barcelona, Spain) October 13-16
Current 2025 (New Orleans, LA, USA) October 29-30

New Releases

Apache Flink Kubernetes Operator 1.12.0
Debezium 3.1.3.Final and 3.2.0.CR1
Apache Iceberg 1.9.1
Apache Pulsar 4.0.5, 3.3.7, and 3.0.12
Fluss 0.7
Strimzi 0.46.1
Kroxylicious 0.13.0

—

That’s all for this month! We hope you’ve enjoyed the newsletter and would love to hear about any feedback or suggestions you’ve got.

Hans-Peter (LinkedIn / Bluesky / X / Mastodon / Email)

‍

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

Hans-Peter Grahsl

Let's get decoding

Decodable is free. No CC required. Never expires.

Start for Free Talk to an Expert Join the Community on Slack

Checkpoint Chronicle - June 2025

Stream Processing, Streaming SQL, and Streaming Databases

Event Streaming

Data Ecosystem

Data Platforms and Architecture

Databases and Change Data Capture

Paper of the Month

Events & Call for Papers (CfP)

New Releases

‍

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Related Posts

Checkpoint Chronicle - May 2025

Checkpoint Chronicle - April 2025

Checkpoint Chronicle - March 2025

Table of contents

Stream Processing, Streaming SQL, and Streaming Databases

Event Streaming

Data Ecosystem

Data Platforms and Architecture

Databases and Change Data Capture

Paper of the Month

Events & Call for Papers (CfP)

New Releases

‍

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Related Posts

Checkpoint Chronicle - May 2025

Checkpoint Chronicle - April 2025

Checkpoint Chronicle - March 2025

Let's get decoding