Blog
Blog /

Record Replay, for When You Want to Begin Again

David Fabritius
Decodable

As a foundational aspect of tolerating failures and recovering from errors, data stream processing systems must be able to “replay” a stream, for the purpose of reproducing or correcting the results of a prior run. This becomes especially important when scaling systems to support large production workloads.

Decodable has built upon the underlying capabilities of Flink to provide a platform that enables you to to start a pipeline, pipeline preview, or connection from the earliest record available so that all records up to the current moment can be reprocessed. It is also possible to start from the latest position, instead of resuming where the last run left off, skipping forward past unprocessed records. Essentially this gives you a “time machine” of sorts, allowing you to go backwards or forwards in time to the moment when you want to start processing records.

Common use cases include the need to re-process data in the event that an error in the SQL code for a pipeline is discovered, or the need to “warm-up” a new downstream datastore or application with data that had previously been sent to a different destination.

Pipeline Processing

When starting a pipeline, you have the option of setting the start position. By enabling Force start, the pipeline is configured to begin processing available data from either the earliest or latest position, instead of resuming where the last run left off.

If the requirements of a pipeline’s SQL changes or a bug was missed during development, incorrect output will be sent to downstream pipelines, datastores, or applications. When this occurs, developers can make modifications to the SQL to address the issue and then the pipeline can be restarted from the desired starting position to produce the correct output stream.

As another example, the accidental deletion of a table that was previously populated by a real-time data stream could be reconstituted by replaying the records from the input stream through the pipeline. Under the Team plan, all streams are allowed to retain data for 7 days, however the total size of all streams may not exceed 100GB.

Pipeline Preview

As a pipeline is being developed, running a preview can help ensure your SQL code is generating the expected output. In the same way as a pipeline, you can choose to start processing from the earliest data available in the input stream (as specified by the FROM clause of the pipeline’s SQL). Doing so allows you to re-process the same records again and again as you make changes to your code, which may help make developing and debugging easier in some scenarios. By default, the preview will start processing the latest available data from the stream.

Connection Processing

In a very similar way to a pipeline, you have the same option of setting the start position when starting a connection. Again by enabling Force start, the connection is configured to begin processing available data from either the earliest or latest position, instead of resuming where the last run left off.

Connections come in two flavors: source and sink. Source connections read from an external system and write to a Decodable stream, while sink connections read from a Decodable stream and write to an external system. It’s important to keep in mind that only select external systems, primarily those which are Kafka-based, offer support for the replaying of a record stream. For those that do, Decodable is able to take advantage of that functionality.

For sink connections, support for setting a start position is exactly the same as for pipelines, since the input for both of these is a Decodable stream.

Let Us Know!

We hope you are as excited about these processing capabilities as we are. How would you take advantage of these features? What additional functionality would you like to see? Join our community Slack or contact us at support@decodable.co and let us know, we would love to hear from you!


Additional Resources

Configuring mTLS for Apache Kafka

In this post, we will step through the process of configuring a single node Apache Kafka cluster with mTLS or mutual TLS. Then we will configure a Decodable connection to read data from that secured Apache Kafka cluster.

Learn more

What is Change Data Capture?

Change Data Capture (CDC) is the idea of capturing changes made to data in a database and then delivering those change events in real-time to a downstream process or system. In this blog we'll break down why this is useful and how to build a system to process CDC streams.

Learn more

Real Time Streaming Joins With SQL

Decodable enables continuous SQL joins across streaming sources including streaming systems like Kafka, Pulsar and Kinesis as well as databases with Change Data Capture (CDC). This is exactly the use-case for real-time joins. In this blog we'll explore how this works, and how it helps bring siloed data into play in low-latency streaming applications.

Learn more

The Top 5 Streaming ETL Patterns

ETL and ELT are traditionally scheduled batch operations, but as the need for always-on, always-current data services becomes the norm, realtime ELT operating on streams of data is the goal of many organizations - if not the reality, yet.In real world usage, the ‘T’ in ETL represents a wide range of patterns assembled from primitive operations. In this blog we’ll explore these operations and see examples of how they’re implemented as SQL statements.

Learn more

Heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Learn more
Tags
Pintrest icon in black
Product
Feature

Start using Decodable today.