Back
July 11, 2023
9
min read

Introducing the Decodable SDK for Custom Pipelines

By
Gunnar Morling
Share this post

Today it’s my great pleasure to announce the first release of a software development kit (SDK) for Decodable Custom Pipelines! This SDK helps you to build custom Apache Flink® jobs, tightly integrated with managed streams and connectors of the Decodable platform. In this blog post I’d like to discuss why we’ve built Custom Pipelines and this SDK, and how to get started with using the SDK.

Decodable Custom Pipelines

Custom Pipelines, written in any JVM language such as Java or Scala, complement Decodable’s support for SQL-based data streaming pipelines. While SQL is a great choice for a large share of data streaming use cases, allowing you to map and project, filter and join, group and aggregate your data, in some cases it might not be flexible enough.

Based on the powerful Java-based APIs of Apache Flink, Custom Pipelines allow you to implement arbitrary custom logic, invoke external web services, integrate 3rd-party connectors which aren’t natively supported by Decodable, and much more, into your data flows. Just like SQL pipelines, you do not need to provision, scale, or tune any infrastructure to leverage Custom Pipelines. 

As such, Custom Pipelines are the perfect escape hatch for cases where SQL doesn’t quite cut it. At the same time, we wanted Custom Pipelines to be closely integrated with the rest of the platform. When running your own Flink jobs, it should be possible to benefit from Decodable managed connectors and persistent, replayable data streams. This is where the new Custom Pipelines SDK comes in, providing you with all the right integration points needed.

Getting Started With the Custom Pipelines SDK

The simplest way for getting started with building your Flink jobs for the Decodable platform is to use the quickstart archetype provided by Apache Flink. Run the following to start a new Maven project:

The SDK is available from Maven Central, so you can simply add it as a dependency to the pom.xml file of the newly generated project:

Alternatively, if you are using Gradle, add the dependency to your build.gradle file like so:

With the SDK dependency in place, you can start developing your stream processing job using the full power of Flink. At the time of publication, Flink’s DataStream API is supported by the SDK, and support for the Table API is on the roadmap. For retrieving data from Decodable streams as well as sending data to them, the SDK provides the DecodableStreamSource and DecodableStreamSink interfaces. You can use these two interfaces to integrate your custom stream processing logic with managed connectors, or other SQL-based pipelines.

Here’s an example of a typical job implementation:

In this example, purchase order data is ingested from the “purchase-orders” stream. This stream is receiving data from a CDC connector and contains data from the database of an ERP system. For ease of use, the data is deserialized into a POJO, PurchaseOrder, using Flink’s JsonSerializationSchema. Each incoming purchase order is processed using a custom mapping function, PurchaseOrderProcessor, which is where, for example, you could invoke an external service for order validation. Finally, the records are written to an output stream via a DecodableStreamSink. Another managed Decodable connector could take the data from that stream and send it to an external system.

To learn more about the Decodable SDK for Custom Pipelines, see the API documentation for the project.

Testing

Good testability is a key requirement during modern application development. To that end, the SDK comes with a complete testing framework, letting you validate the logic of your stream processing job in end-to-end tests. The testing framework uses the Testcontainers project for spinning up the underlying infrastructure. Here is a test for the stream processing job above:

Using Testcontainers, an ephemeral Redpanda broker is started as a data streaming platform. The test inserts a purchase order as a JSON document into the “purchase-orders” stream. Next, the job under test is executed. Finally, one record is retrieved from the “purchase-orders-processed” stream and its content is asserted to match the expected structure.

This testing framework allows you to implement comprehensive tests, in your own IDE or CI environment, so that you can verify that your jobs work as expected before deploying them to production.

Deploying your Custom Pipelines

After implementing and testing your stream processing job, you can now deploy it as a pipeline to Decodable. Make sure you have the Custom Pipelines feature enabled in the Decodable UI:

Then, package your job as a JAR file by running mvn clean verify from the root directory of your project. Next, deploy your job to Decodable using the Decodable CLI:

Alternatively, you can upload your job via the web UI:

At this point, your job is deployed but it isn’t running yet. You can activate it either in the CLI like so:

Or, using the UI:

After waiting a few moments, your job is running and processing purchase orders from the input stream and sending them to the output stream.

Next Steps

Today’s Beta1 release marks the first step towards a fully supported and feature-rich SDK for Decodable Custom Pipelines. This is a perfect time for you to join the Decodable user community, sign up for Custom Pipelines, and give it a try.

Over the next few weeks and months, we plan to build out the SDK, e.g., adding integration with Flink’s Table API, creating a bespoke Custom Pipelines Maven Archetype, providing support for collecting metrics and accessing Decodable secrets in your jobs, and much more.

The SDK is fully open source (using the Apache License, version 2), and its source code is managed on GitHub. Your contributions in the form of bug reports, feature requests, and pull requests are highly welcome. We’d also love to hear from you about your use cases for Custom Pipelines and the SDK. Until then, Happy Streaming!

Additional Resources

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Gunnar Morling

Gunnar is an open-source enthusiast at heart, currently working on Apache Flink-based stream processing. In his prior role as a software engineer at Red Hat, he led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct.

Table of contents

Let's Get Decoding

Today it’s my great pleasure to announce the first release of a software development kit (SDK) for Decodable Custom Pipelines! This SDK helps you to build custom Apache Flink® jobs, tightly integrated with managed streams and connectors of the Decodable platform. In this blog post I’d like to discuss why we’ve built Custom Pipelines and this SDK, and how to get started with using the SDK.

Decodable Custom Pipelines

Custom Pipelines, written in any JVM language such as Java or Scala, complement Decodable’s support for SQL-based data streaming pipelines. While SQL is a great choice for a large share of data streaming use cases, allowing you to map and project, filter and join, group and aggregate your data, in some cases it might not be flexible enough.

Based on the powerful Java-based APIs of Apache Flink, Custom Pipelines allow you to implement arbitrary custom logic, invoke external web services, integrate 3rd-party connectors which aren’t natively supported by Decodable, and much more, into your data flows. Just like SQL pipelines, you do not need to provision, scale, or tune any infrastructure to leverage Custom Pipelines. 

As such, Custom Pipelines are the perfect escape hatch for cases where SQL doesn’t quite cut it. At the same time, we wanted Custom Pipelines to be closely integrated with the rest of the platform. When running your own Flink jobs, it should be possible to benefit from Decodable managed connectors and persistent, replayable data streams. This is where the new Custom Pipelines SDK comes in, providing you with all the right integration points needed.

Getting Started With the Custom Pipelines SDK

The simplest way for getting started with building your Flink jobs for the Decodable platform is to use the quickstart archetype provided by Apache Flink. Run the following to start a new Maven project:

The SDK is available from Maven Central, so you can simply add it as a dependency to the pom.xml file of the newly generated project:

Alternatively, if you are using Gradle, add the dependency to your build.gradle file like so:

With the SDK dependency in place, you can start developing your stream processing job using the full power of Flink. At the time of publication, Flink’s DataStream API is supported by the SDK, and support for the Table API is on the roadmap. For retrieving data from Decodable streams as well as sending data to them, the SDK provides the DecodableStreamSource and DecodableStreamSink interfaces. You can use these two interfaces to integrate your custom stream processing logic with managed connectors, or other SQL-based pipelines.

Here’s an example of a typical job implementation:

In this example, purchase order data is ingested from the “purchase-orders” stream. This stream is receiving data from a CDC connector and contains data from the database of an ERP system. For ease of use, the data is deserialized into a POJO, PurchaseOrder, using Flink’s JsonSerializationSchema. Each incoming purchase order is processed using a custom mapping function, PurchaseOrderProcessor, which is where, for example, you could invoke an external service for order validation. Finally, the records are written to an output stream via a DecodableStreamSink. Another managed Decodable connector could take the data from that stream and send it to an external system.

To learn more about the Decodable SDK for Custom Pipelines, see the API documentation for the project.

Testing

Good testability is a key requirement during modern application development. To that end, the SDK comes with a complete testing framework, letting you validate the logic of your stream processing job in end-to-end tests. The testing framework uses the Testcontainers project for spinning up the underlying infrastructure. Here is a test for the stream processing job above:

Using Testcontainers, an ephemeral Redpanda broker is started as a data streaming platform. The test inserts a purchase order as a JSON document into the “purchase-orders” stream. Next, the job under test is executed. Finally, one record is retrieved from the “purchase-orders-processed” stream and its content is asserted to match the expected structure.

This testing framework allows you to implement comprehensive tests, in your own IDE or CI environment, so that you can verify that your jobs work as expected before deploying them to production.

Deploying your Custom Pipelines

After implementing and testing your stream processing job, you can now deploy it as a pipeline to Decodable. Make sure you have the Custom Pipelines feature enabled in the Decodable UI:

Then, package your job as a JAR file by running mvn clean verify from the root directory of your project. Next, deploy your job to Decodable using the Decodable CLI:

Alternatively, you can upload your job via the web UI:

At this point, your job is deployed but it isn’t running yet. You can activate it either in the CLI like so:

Or, using the UI:

After waiting a few moments, your job is running and processing purchase orders from the input stream and sending them to the output stream.

Next Steps

Today’s Beta1 release marks the first step towards a fully supported and feature-rich SDK for Decodable Custom Pipelines. This is a perfect time for you to join the Decodable user community, sign up for Custom Pipelines, and give it a try.

Over the next few weeks and months, we plan to build out the SDK, e.g., adding integration with Flink’s Table API, creating a bespoke Custom Pipelines Maven Archetype, providing support for collecting metrics and accessing Decodable secrets in your jobs, and much more.

The SDK is fully open source (using the Apache License, version 2), and its source code is managed on GitHub. Your contributions in the form of bug reports, feature requests, and pull requests are highly welcome. We’d also love to hear from you about your use cases for Custom Pipelines and the SDK. Until then, Happy Streaming!

Additional Resources

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Gunnar Morling

Gunnar is an open-source enthusiast at heart, currently working on Apache Flink-based stream processing. In his prior role as a software engineer at Red Hat, he led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct.

Let's Get Decoding