Back
April 25, 2023
8
min read

Support for Running Your Own Apache Flink Jobs Is Coming To Decodable

By
Gunnar Morling
Share this post

It’s my great pleasure to announce a major new feature of the Decodable platform: support for running your own custom Apache Flink® jobs, written in any JVM-based programming language such as Java or Scala. This takes the Decodable managed service experience to a completely new level, and I couldn’t be more excited about finally seeing this feature out in the wild! Firm believers into the release early, release often mantra, we’re making support for JVM-based custom Flink jobs available as a technology preview today, and we’re looking forward very much to the feedback from the Decodable user community.

This new capability complements and tightly integrates with Decodable’s existing support for SQL-based data streaming pipelines, allowing you to choose freely between declarative and imperative job implementations, depending on your specific needs and requirements. In the remainder of this blog post, I would like to discuss the philosophy behind this new feature, how it integrates with the Decodable managed service experience, and how you can start and give it a try yourself.

Why Custom Apache Flink Jobs?

SQL is the industry standard for declarative data processing. It is widely supported not only to query data at rest in OLTP and OLAP systems, but it also is a popular choice to run queries against data in motion in real-time stream processing platforms like Apache Flink. There are even proposals for integrating streaming semantics into the official SQL standard.

From its beginning, Decodable has therefore put a strong emphasis on using SQL as the means of defining stream processing pipelines. As there are way more people who are able to read and write SQL than there are, for instance, Java developers, SQL is a catalyst for the democratization of data. Its declarative nature makes it easy to define and run data streaming jobs, transparently benefitting from many optimizations in the underlying platform, such as improving the performance of queries via code generation, cost-based query planning and execution, and cost-efficient, dense packing in cloud environments.

As we’ve learned from customers over time, in some cases SQL can be a bit too inflexible for the task at hand, though. What if, for instance, you need to invoke an external web service from within your stream processing job? Or integrate a custom library with a highly optimized algorithm implementation? Or maybe you have already implemented some Flink jobs and now look for an alternative to operating your own Flink cluster?

Custom Decodable pipelines address all these requirements. By deploying your JVM-based custom Flink jobs into the Decodable platform, you gain the required flexibility for cases where SQL isn’t versatile enough. Existing Flink workloads, e.g., running on a self-managed cluster, can easily be migrated to Decodable with no code changes. With support for custom Flink-based pipelines, the Decodable platform is becoming the one-stop shop for all your stream processing needs.

Integration Into the Decodable Platform

Now, we didn’t want to build just some platform for deploying Flink jobs, but we wanted to do it the Decodable Way. No matter whether you are running SQL-based pipelines or custom Flink jobs, you should be able to benefit from all the platform features you’ve come to love, including fully-managed connectors for a wide range of external data sources and sinks, observability and monitoring, role-based access control (RBAC), and much more.

To that end, custom Flink jobs are introduced to the Decodable platform as another kind of pipeline (with SQL-based pipelines being the other, and so far only, alternative). This means you can easily integrate with fully-managed connectors, only moving those parts of your logic into a custom job which require a bespoke imperative implementation. You also can arbitrarily combine SQL-based and custom pipelines within one and the same end-to-end data flow.

Fig. 1: SQL-based and Flink-based custom pipelines in Decodable.

That way, Decodable allows you to benefit from SQL whenever it’s possible (ease of use, automatic query optimizations, etc.), while giving you the “escape hatch” of implementing and running bespoke Flink jobs when and where it’s needed. When migrating existing Flink jobs to Decodable, you can either run them as-is, or gradually extract existing specific parts into “native” Decodable components, for instance the source and sink connectors of a job, or those processing parts of a job which can be expressed using SQL.

To further simplify the integration of custom Flink jobs with the Decodable platform, we are planning to publish an SDK which will allow you to access the Decodable streams of your account in the form of a simple Flink source and sink. The following example code shows how this could look like (subject to change, this SDK has not been released yet):

Listing 1: A Flink job using the upcoming Decodable SDK.

In this example, an instance of DecodableSource (representing the managed Decodable stream “addresses”) and an instance of DecodableSink (representing the “enrichedAddresses” stream) are created. For each incoming address object, an external service for geo-coding addresses is invoked, the address is enriched with the coordinates returned by the service, and finally emitted to the sink stream.

Lastly, a word on security, which is a key concern for the team here at Decodable. Just as it’s the case for SQL-based pipelines, we’ve taken measures to make sure that custom Flink workloads of different customers cannot interfere with each other. Workloads are run in isolated security contexts and no code or data access is possible across account boundaries.

Trying It Out Yourself

We are releasing custom pipelines in Decodable, based on bespoke Apache Flink jobs, as a Tech Preview today. This means that this is the perfect time for you to get your hands onto this new feature and provide your feedback to us.

In order to do so, please add yourself to our waiting list by sending us a message here. Interested existing Decodable customers should get in touch via the usual communication channels. An expert from Decodable will then reach out to you to discuss the next steps.

Over the next few weeks, we’re planning to further build out and improve the support for deploying your own Flink jobs to Decodable. Stay tuned for the aforementioned SDK, access to the web console built into Apache Flink, powerful logging and metrics capabilities, and much more.

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Gunnar Morling

Gunnar is an open-source enthusiast at heart, currently working on Apache Flink-based stream processing. In his prior role as a software engineer at Red Hat, he led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct.

Table of contents

Let's Get Decoding

It’s my great pleasure to announce a major new feature of the Decodable platform: support for running your own custom Apache Flink® jobs, written in any JVM-based programming language such as Java or Scala. This takes the Decodable managed service experience to a completely new level, and I couldn’t be more excited about finally seeing this feature out in the wild! Firm believers into the release early, release often mantra, we’re making support for JVM-based custom Flink jobs available as a technology preview today, and we’re looking forward very much to the feedback from the Decodable user community.

This new capability complements and tightly integrates with Decodable’s existing support for SQL-based data streaming pipelines, allowing you to choose freely between declarative and imperative job implementations, depending on your specific needs and requirements. In the remainder of this blog post, I would like to discuss the philosophy behind this new feature, how it integrates with the Decodable managed service experience, and how you can start and give it a try yourself.

Why Custom Apache Flink Jobs?

SQL is the industry standard for declarative data processing. It is widely supported not only to query data at rest in OLTP and OLAP systems, but it also is a popular choice to run queries against data in motion in real-time stream processing platforms like Apache Flink. There are even proposals for integrating streaming semantics into the official SQL standard.

From its beginning, Decodable has therefore put a strong emphasis on using SQL as the means of defining stream processing pipelines. As there are way more people who are able to read and write SQL than there are, for instance, Java developers, SQL is a catalyst for the democratization of data. Its declarative nature makes it easy to define and run data streaming jobs, transparently benefitting from many optimizations in the underlying platform, such as improving the performance of queries via code generation, cost-based query planning and execution, and cost-efficient, dense packing in cloud environments.

As we’ve learned from customers over time, in some cases SQL can be a bit too inflexible for the task at hand, though. What if, for instance, you need to invoke an external web service from within your stream processing job? Or integrate a custom library with a highly optimized algorithm implementation? Or maybe you have already implemented some Flink jobs and now look for an alternative to operating your own Flink cluster?

Custom Decodable pipelines address all these requirements. By deploying your JVM-based custom Flink jobs into the Decodable platform, you gain the required flexibility for cases where SQL isn’t versatile enough. Existing Flink workloads, e.g., running on a self-managed cluster, can easily be migrated to Decodable with no code changes. With support for custom Flink-based pipelines, the Decodable platform is becoming the one-stop shop for all your stream processing needs.

Integration Into the Decodable Platform

Now, we didn’t want to build just some platform for deploying Flink jobs, but we wanted to do it the Decodable Way. No matter whether you are running SQL-based pipelines or custom Flink jobs, you should be able to benefit from all the platform features you’ve come to love, including fully-managed connectors for a wide range of external data sources and sinks, observability and monitoring, role-based access control (RBAC), and much more.

To that end, custom Flink jobs are introduced to the Decodable platform as another kind of pipeline (with SQL-based pipelines being the other, and so far only, alternative). This means you can easily integrate with fully-managed connectors, only moving those parts of your logic into a custom job which require a bespoke imperative implementation. You also can arbitrarily combine SQL-based and custom pipelines within one and the same end-to-end data flow.

Fig. 1: SQL-based and Flink-based custom pipelines in Decodable.

That way, Decodable allows you to benefit from SQL whenever it’s possible (ease of use, automatic query optimizations, etc.), while giving you the “escape hatch” of implementing and running bespoke Flink jobs when and where it’s needed. When migrating existing Flink jobs to Decodable, you can either run them as-is, or gradually extract existing specific parts into “native” Decodable components, for instance the source and sink connectors of a job, or those processing parts of a job which can be expressed using SQL.

To further simplify the integration of custom Flink jobs with the Decodable platform, we are planning to publish an SDK which will allow you to access the Decodable streams of your account in the form of a simple Flink source and sink. The following example code shows how this could look like (subject to change, this SDK has not been released yet):

Listing 1: A Flink job using the upcoming Decodable SDK.

In this example, an instance of DecodableSource (representing the managed Decodable stream “addresses”) and an instance of DecodableSink (representing the “enrichedAddresses” stream) are created. For each incoming address object, an external service for geo-coding addresses is invoked, the address is enriched with the coordinates returned by the service, and finally emitted to the sink stream.

Lastly, a word on security, which is a key concern for the team here at Decodable. Just as it’s the case for SQL-based pipelines, we’ve taken measures to make sure that custom Flink workloads of different customers cannot interfere with each other. Workloads are run in isolated security contexts and no code or data access is possible across account boundaries.

Trying It Out Yourself

We are releasing custom pipelines in Decodable, based on bespoke Apache Flink jobs, as a Tech Preview today. This means that this is the perfect time for you to get your hands onto this new feature and provide your feedback to us.

In order to do so, please add yourself to our waiting list by sending us a message here. Interested existing Decodable customers should get in touch via the usual communication channels. An expert from Decodable will then reach out to you to discuss the next steps.

Over the next few weeks, we’re planning to further build out and improve the support for deploying your own Flink jobs to Decodable. Stay tuned for the aforementioned SDK, access to the web console built into Apache Flink, powerful logging and metrics capabilities, and much more.

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Gunnar Morling

Gunnar is an open-source enthusiast at heart, currently working on Apache Flink-based stream processing. In his prior role as a software engineer at Red Hat, he led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct.

Let's Get Decoding