Back
June 30, 2025
5
min read

Checkpoint Chronicle - June 2025

Checkpoint Chronicle - June 2025

Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your editor-in-chief for this edition is Hans-Peter Grahsl. Feel free to send me any choice nuggets that you think we should feature in future editions.

Stream Processing, Streaming SQL, and Streaming Databases

  • If you’ve always been curious what watermarks are in Apache Flink, here is your chance to learn about them in almost no time by reading Robin Moffatt’s article.‍
  • Flink SQL Runner is a curated set of tools and extensions to help run Apache Flink SQL applications on top of the Flink Kubernetes Operator. ‍
  • Tributary is a DuckDB extension, addressing data engineers and analysts alike, which provides a seamless integration between Apache Kafka and DuckDB for real-time querying and analysis of streaming data using SQL.

Event Streaming

  • Strimzi is not only a well-known and widely deployed project in the Kafka space, it even has its own virtual conference. In case you missed StrimziCon 2025, here is a quick session recap with links to all recordings and slides.‍
  • Kroxylicious, the snappy open source proxy for Apache Kafka, recently shipped version 0.13.0 which includes an operator to run the proxy on Kubernetes.
  • Apache Avro is a popular serialization format for Kafka records. To remove some friction when working with Avro data in the context of CLI tooling Dale Lane open sourced  kafka-avro-formatters.
  • Even though Apache Kafka is around for several years, there is still demand for beginners’ content. Here is one such article recently written by Vu Trinh.
  • In “Kafka: The End of the Beginning”, Chris Riccomini reflects on the last decade in the  event streaming and stream processing spaces. Unsurprisingly, there are a few spicy takes in there which might justify bringing popcorn ;-)

Data Ecosystem

  • MLflow 3 - released in early June - extends MLflow’s foundation to address the challenging requirements of generative AI workloads, in particular how to measure and ensure quality and stability.‍
  • Apache Iceberg enthusiasts have been eagerly awaiting this moment… the ratification of the v3 table spec. Read more in this blog article by Danica Fine and Kevin Liu who briefly walk you through the main features and share what this community-driven effort means going forward.‍
  • Fluss - the streaming storage for real-time analytics - officially released version 0.7 lately. Besides the announcement post there is a webinar recording to dive deeper into all its new features and improvements.

Data Platforms and Architecture

Databases and Change Data Capture

Paper of the Month

The research paper by Alexander Behm et al. describes Photon, a vectorized query engine built for Lakehouse systems. Photon is implemented in C++ and tightly integrates with Apache Spark APIs to support both SQL and DataFrame-based workloads. It tackles two core challenges: performance over raw, uncurated datasets and semantic compatibility. Photon delivers average query speedups of ~3x (up to 10x) compared to legacy Spark runtimes, and enabled a 100 TB TPC‑DS world record on a Delta‑Lake/S3 Lakehouse.

Events & Call for Papers (CfP)

New Releases

—

That’s all for this month! We hope you’ve enjoyed the newsletter and would love to hear about any feedback or suggestions you’ve got.

Hans-Peter (LinkedIn / Bluesky / X / Mastodon / Email)

‍

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Hans-Peter Grahsl

Hans-Peter Grahsl is a Staff Developer Advocate at Decodable. He is an open-source community enthusiast and in particular passionate about event-driven architectures, distributed stream processing systems and data engineering. For his code contributions, conference talks and blog post writing at the intersection of the Apache Kafka and MongoDB communities, Hans-Peter received multiple community awards. He likes to code and is a regular speaker at developer conferences around the world.

Welcome to the Checkpoint Chronicle, a monthly roundup of interesting stuff in the data and streaming space. Your editor-in-chief for this edition is Hans-Peter Grahsl. Feel free to send me any choice nuggets that you think we should feature in future editions.

Stream Processing, Streaming SQL, and Streaming Databases

  • If you’ve always been curious what watermarks are in Apache Flink, here is your chance to learn about them in almost no time by reading Robin Moffatt’s article.‍
  • Flink SQL Runner is a curated set of tools and extensions to help run Apache Flink SQL applications on top of the Flink Kubernetes Operator. ‍
  • Tributary is a DuckDB extension, addressing data engineers and analysts alike, which provides a seamless integration between Apache Kafka and DuckDB for real-time querying and analysis of streaming data using SQL.

Event Streaming

  • Strimzi is not only a well-known and widely deployed project in the Kafka space, it even has its own virtual conference. In case you missed StrimziCon 2025, here is a quick session recap with links to all recordings and slides.‍
  • Kroxylicious, the snappy open source proxy for Apache Kafka, recently shipped version 0.13.0 which includes an operator to run the proxy on Kubernetes.
  • Apache Avro is a popular serialization format for Kafka records. To remove some friction when working with Avro data in the context of CLI tooling Dale Lane open sourced  kafka-avro-formatters.
  • Even though Apache Kafka is around for several years, there is still demand for beginners’ content. Here is one such article recently written by Vu Trinh.
  • In “Kafka: The End of the Beginning”, Chris Riccomini reflects on the last decade in the  event streaming and stream processing spaces. Unsurprisingly, there are a few spicy takes in there which might justify bringing popcorn ;-)

Data Ecosystem

  • MLflow 3 - released in early June - extends MLflow’s foundation to address the challenging requirements of generative AI workloads, in particular how to measure and ensure quality and stability.‍
  • Apache Iceberg enthusiasts have been eagerly awaiting this moment… the ratification of the v3 table spec. Read more in this blog article by Danica Fine and Kevin Liu who briefly walk you through the main features and share what this community-driven effort means going forward.‍
  • Fluss - the streaming storage for real-time analytics - officially released version 0.7 lately. Besides the announcement post there is a webinar recording to dive deeper into all its new features and improvements.

Data Platforms and Architecture

Databases and Change Data Capture

Paper of the Month

The research paper by Alexander Behm et al. describes Photon, a vectorized query engine built for Lakehouse systems. Photon is implemented in C++ and tightly integrates with Apache Spark APIs to support both SQL and DataFrame-based workloads. It tackles two core challenges: performance over raw, uncurated datasets and semantic compatibility. Photon delivers average query speedups of ~3x (up to 10x) compared to legacy Spark runtimes, and enabled a 100 TB TPC‑DS world record on a Delta‑Lake/S3 Lakehouse.

Events & Call for Papers (CfP)

New Releases

—

That’s all for this month! We hope you’ve enjoyed the newsletter and would love to hear about any feedback or suggestions you’ve got.

Hans-Peter (LinkedIn / Bluesky / X / Mastodon / Email)

‍

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

Hans-Peter Grahsl

Hans-Peter Grahsl is a Staff Developer Advocate at Decodable. He is an open-source community enthusiast and in particular passionate about event-driven architectures, distributed stream processing systems and data engineering. For his code contributions, conference talks and blog post writing at the intersection of the Apache Kafka and MongoDB communities, Hans-Peter received multiple community awards. He likes to code and is a regular speaker at developer conferences around the world.

Let's get decoding

Decodable is free. No CC required. Never expires.