Back
January 1, 2024
5
min read

Building a Fully Managed Apache Flink Service, Behind the Scenes

By
Jared Breeden
Share this post

What does it take to build a production-ready stream processing platform? What considerations should companies be thinking of beyond the core technologies, the things that really turn the promised capabilities into actual line-of-business functionality? At Decodable, we have spent years exploring and addressing the complexities of this challenge, resulting in a comprehensive solution.

The foundation lies in utilizing open-source systems like Apache Flink, Apache Kafka, and Debezium, purpose-built for stream processing challenges. Together, they provide the systems to transform and analyze data streams, to ingest, store, and transport those streams, and to support change data capture (CDC).

Here at Decodable, we’ve built a solution that goes beyond the foundational technologies, addressing the broader requirements of real-time stream processing for ELT, ETL, and data replication. This includes ensuring a solid developer experience, providing extensive and flexible connectivity, managing schema, ensuring scalability across different workloads and use cases, providing observability, maintaining security, adhering to data governance, compliance, and offering ongoing support. Let’s take a closer look at these different areas.

Stream Processing Developer Experience

As a fully-managed Platform-as-a-Service, our platform takes care of the stream processing infrastructure and the deployment of Flink jobs, so developers can focus on the business logic for their data pipelines. That means there are no servers for you to manage, no clusters to create, size, or monitor, and no software dependencies to update or maintain within our platform.

As a developer, you can choose how you want to work. Whether it’s through the web UI, using the terminal-based CLI, programming against the REST API, or even using the Decodable dbt adapter. Any one, or a combination of several, of these methods can be used to configure connections to data sources and sinks, define real-time streams, and create data pipelines.

The web-based SQL editor allows users to author data pipelines, iterating quickly thanks to the built-in output preview feature, and easily deploy them. The version history of your SQL jobs is tracked automatically and can be used to rollback at any time. SQL job definitions can be stored in source control and deployed with the CLI or dbt adapter. For those preferring to work with existing Flink jobs or who want to directly interact with the Flink APIs, deploying custom jobs written in Java (or other JVM-based languages, such as Scala) is as simple as uploading the JAR file and clicking “Start.” An SDK is also provided for connecting custom jobs to other pipelines and connectors, along with support for the Flink web dashboard.

Our platform also provides transparent and robust error handling and recovery. Automatic handling of hardware and software failures in the infrastructure allows your data pipelines to auto-recover from failures and resume where they left off. This greatly reduces the complexity of the stream processing jobs, enabling developers to focus on critical business logic.

Connector Catalog: Apache Kafka, Amazon Kinesis, & More

Before processing your data streams, or sending the results to an external system, you must first establish connections to your data sources and sinks. This includes handling version upgrades, quality management, performance tuning, handling malformed data, and dealing with append-only versus CDC data. Fortunately, we have you covered.

Our platform includes a large and growing catalog of managed connectors for getting streams from a wide variety of sources (e.g., OLTP databases, event streaming platforms, application caches) and sending them to an array of different sinks (e.g., OLAP databases, real-time analytics systems, data lakes and warehouses, object stores). Our connectors handle the complexity of connecting to external systems and reliably reading & writing data. They require a small amount of configuration to get started, such as the host name & credentials for the target system. These connectors enable use cases such as:

  • Stream processing by connecting to systems like Kafka, Redpanda, and AWS Kinesis.
  • Change data capture (via Debezium), supporting RDBMS like MySQL & PostgreSQL, and data warehouse solutions like Snowflake.
  • Simple HTTP-based event collection with the REST connector.

Source connections produce to a stream, while sink connections consume from a stream. You control the resources a connection gets by specifying the task size and count, which greatly simplifies the requirement to scale the integration between data systems.

If you need to connect to a system that's not yet supported by one of our built-in connectors, you can still do so with custom pipelines. Writing the connection logic yourself requires a bit of Flink programming knowledge, but once written it can be easily deployed, scaled, and monitored like any other Decodable job.

Schema Management and Inference

Another key consideration for stream processing systems is how to unify and translate data types across connected systems. Schema definition and management can be tedious and error prone, but we solve this by:

  • Inferring schemas from external systems where possible. For example, by reading existing schema from Confluent schema registry.
  • Supporting automatic conversion of existing schemas for popular serialization systems like Avro or JSON schema.

Once a schema is defined, source connections can be configured to ignore non-conforming data, or halt progress so the issue can be addressed without data loss. Once data has made it through a connector into your Decodable account, you can be sure it conforms to the defined schemas.

Output schemas for SQL data pipelines are automatically inferred based on query semantics within our own platform.Future updates will extend this schema inference to external systems like MySQL and Snowflake. This will enable the construction of end-to-end data flows without the need to manually define schemas. It also allows you to benefit from the safety guarantees that static schemas provide at authoring time, while also supporting large-scale deployments with thousands of source and sink tables or more.

Scalability for Real-time Data Pipelines

Closely related to the requirement of managing the infrastructure of a stream processing platform is the need to scale resource availability up and down for various tasks and workloads. We have abstracted this complex problem and implemented the far simpler concepts of task sizing and task count, which are defined for each connection and data pipeline. Do you need more capacity? Pick a task size to define the maximum level of resource utilization and a task count to define the degree of parallelism—we’ll handle the rest. Our platform can scaledata pipelines up or down as needed. The ease and flexibility of configuring the scalability of your jobs allows you to advance from small-scale proof-of-concept to full-scale production in just a few minutes, consuming only the resources you need, when you need them.

In addition, the Decodable platform monitors and optimizes the infrastructure, eliminating the need for constant tuning of memory, I/O, and timeouts.

Observability

For production environments, it is important that the control plane is instrumented for auditing, the data plane is instrumented for health, quality, & performance, and observability data is centrally available for compliance, debugging, and performance tuning.

Our platform provides performance metrics and error monitoring out of the box through our web UI, CLI, and APIs. For each account, metrics for all jobs and audit logs for all events are published to system-provided “_metrics” and “_events” streams. This data can be processed by pipelines like any other data in our platform, or written to external reporting and alerting systems with our pre-built connectors, such as DataDog or Snowflake.

Security and Compliance

In the initial stages of implementing a DIY stream processing platform, things like security and compliance tend to take a back seat. And while that can work during a proof-of-concept phase, these crucial aspects have to be rock solid before organizations can move to production and run their key workloads that process business critical data.

Designed with security and compliance in mind, our platform features Role based access control (RBAC) to restrict access to connections, streams, and pipelines based on user roles. This ensures that sensitive data can be governed according to the specific needs of the organization.

Decodable has also achieved both SOC2 Type II and GDPR compliance certifications. In addition, customers can choose between two deployment modes, fully hosted or bring-your-own-cloud (BYOC). Depending on the requirements of the business, enhanced levels of internal compliance can be satisfied by running in a BYOC mode, where your data is never leaving your security perimeter.

Support

One other thing that all large, complex IT systems have in common is the need for a robust approach to providing support, and stream processing platforms are certainly no exception. We have a team of Flink and Debezium experts who proactively monitor all active pipelines and connections for our customers. As part of our service, we automatically address any service-related issues that may arise.You will be notified by either the system or our support team if service issues impact your resources on our platform.

If you suspect there’s an issue with the service, we are always available to initiate an investigation and provide support. Examples of service issues include the analysis and remediation of:

  • Bugs
  • Node failures and infrastructure degradation
  • Partial or full network interruptions
  • Identity and access control related issues
  • Internal connection, stream, and pipeline failures

While we’ve designed the service to minimize the chances of accidental errors in configuration or use, some issues are difficult to avoid and cannot be remediated by us without your assistance, such as in the case of incorrect credentials for a source database you’d like to connect to. For customers with Enterprise agreements, we are here to answer questions, refine your stream processing jobs to maximize performance and cost efficiency, and provide proactive support. Our support team will contact you if they suspect a connection or pipeline is misconfigured.

Summary

At Decodable, our goal is to help businesses more quickly and easily gain the benefits of real-time ETL and stream processing to accelerate from proof-of-concept to production-readiness—without having to (re-)build their entire infrastructure from the ground up. Rooted in open source software, our platform stands on the shoulders of giants. We have built supporting technology and services around this foundation, providing a comprehensive stream processing platform tailored to run your business-critical applications and workloads.

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Jared Breeden

Table of contents

Let's Get Decoding

What does it take to build a production-ready stream processing platform? What considerations should companies be thinking of beyond the core technologies, the things that really turn the promised capabilities into actual line-of-business functionality? At Decodable, we have spent years exploring and addressing the complexities of this challenge, resulting in a comprehensive solution.

The foundation lies in utilizing open-source systems like Apache Flink, Apache Kafka, and Debezium, purpose-built for stream processing challenges. Together, they provide the systems to transform and analyze data streams, to ingest, store, and transport those streams, and to support change data capture (CDC).

Here at Decodable, we’ve built a solution that goes beyond the foundational technologies, addressing the broader requirements of real-time stream processing for ELT, ETL, and data replication. This includes ensuring a solid developer experience, providing extensive and flexible connectivity, managing schema, ensuring scalability across different workloads and use cases, providing observability, maintaining security, adhering to data governance, compliance, and offering ongoing support. Let’s take a closer look at these different areas.

Stream Processing Developer Experience

As a fully-managed Platform-as-a-Service, our platform takes care of the stream processing infrastructure and the deployment of Flink jobs, so developers can focus on the business logic for their data pipelines. That means there are no servers for you to manage, no clusters to create, size, or monitor, and no software dependencies to update or maintain within our platform.

As a developer, you can choose how you want to work. Whether it’s through the web UI, using the terminal-based CLI, programming against the REST API, or even using the Decodable dbt adapter. Any one, or a combination of several, of these methods can be used to configure connections to data sources and sinks, define real-time streams, and create data pipelines.

The web-based SQL editor allows users to author data pipelines, iterating quickly thanks to the built-in output preview feature, and easily deploy them. The version history of your SQL jobs is tracked automatically and can be used to rollback at any time. SQL job definitions can be stored in source control and deployed with the CLI or dbt adapter. For those preferring to work with existing Flink jobs or who want to directly interact with the Flink APIs, deploying custom jobs written in Java (or other JVM-based languages, such as Scala) is as simple as uploading the JAR file and clicking “Start.” An SDK is also provided for connecting custom jobs to other pipelines and connectors, along with support for the Flink web dashboard.

Our platform also provides transparent and robust error handling and recovery. Automatic handling of hardware and software failures in the infrastructure allows your data pipelines to auto-recover from failures and resume where they left off. This greatly reduces the complexity of the stream processing jobs, enabling developers to focus on critical business logic.

Connector Catalog: Apache Kafka, Amazon Kinesis, & More

Before processing your data streams, or sending the results to an external system, you must first establish connections to your data sources and sinks. This includes handling version upgrades, quality management, performance tuning, handling malformed data, and dealing with append-only versus CDC data. Fortunately, we have you covered.

Our platform includes a large and growing catalog of managed connectors for getting streams from a wide variety of sources (e.g., OLTP databases, event streaming platforms, application caches) and sending them to an array of different sinks (e.g., OLAP databases, real-time analytics systems, data lakes and warehouses, object stores). Our connectors handle the complexity of connecting to external systems and reliably reading & writing data. They require a small amount of configuration to get started, such as the host name & credentials for the target system. These connectors enable use cases such as:

  • Stream processing by connecting to systems like Kafka, Redpanda, and AWS Kinesis.
  • Change data capture (via Debezium), supporting RDBMS like MySQL & PostgreSQL, and data warehouse solutions like Snowflake.
  • Simple HTTP-based event collection with the REST connector.

Source connections produce to a stream, while sink connections consume from a stream. You control the resources a connection gets by specifying the task size and count, which greatly simplifies the requirement to scale the integration between data systems.

If you need to connect to a system that's not yet supported by one of our built-in connectors, you can still do so with custom pipelines. Writing the connection logic yourself requires a bit of Flink programming knowledge, but once written it can be easily deployed, scaled, and monitored like any other Decodable job.

Schema Management and Inference

Another key consideration for stream processing systems is how to unify and translate data types across connected systems. Schema definition and management can be tedious and error prone, but we solve this by:

  • Inferring schemas from external systems where possible. For example, by reading existing schema from Confluent schema registry.
  • Supporting automatic conversion of existing schemas for popular serialization systems like Avro or JSON schema.

Once a schema is defined, source connections can be configured to ignore non-conforming data, or halt progress so the issue can be addressed without data loss. Once data has made it through a connector into your Decodable account, you can be sure it conforms to the defined schemas.

Output schemas for SQL data pipelines are automatically inferred based on query semantics within our own platform.Future updates will extend this schema inference to external systems like MySQL and Snowflake. This will enable the construction of end-to-end data flows without the need to manually define schemas. It also allows you to benefit from the safety guarantees that static schemas provide at authoring time, while also supporting large-scale deployments with thousands of source and sink tables or more.

Scalability for Real-time Data Pipelines

Closely related to the requirement of managing the infrastructure of a stream processing platform is the need to scale resource availability up and down for various tasks and workloads. We have abstracted this complex problem and implemented the far simpler concepts of task sizing and task count, which are defined for each connection and data pipeline. Do you need more capacity? Pick a task size to define the maximum level of resource utilization and a task count to define the degree of parallelism—we’ll handle the rest. Our platform can scaledata pipelines up or down as needed. The ease and flexibility of configuring the scalability of your jobs allows you to advance from small-scale proof-of-concept to full-scale production in just a few minutes, consuming only the resources you need, when you need them.

In addition, the Decodable platform monitors and optimizes the infrastructure, eliminating the need for constant tuning of memory, I/O, and timeouts.

Observability

For production environments, it is important that the control plane is instrumented for auditing, the data plane is instrumented for health, quality, & performance, and observability data is centrally available for compliance, debugging, and performance tuning.

Our platform provides performance metrics and error monitoring out of the box through our web UI, CLI, and APIs. For each account, metrics for all jobs and audit logs for all events are published to system-provided “_metrics” and “_events” streams. This data can be processed by pipelines like any other data in our platform, or written to external reporting and alerting systems with our pre-built connectors, such as DataDog or Snowflake.

Security and Compliance

In the initial stages of implementing a DIY stream processing platform, things like security and compliance tend to take a back seat. And while that can work during a proof-of-concept phase, these crucial aspects have to be rock solid before organizations can move to production and run their key workloads that process business critical data.

Designed with security and compliance in mind, our platform features Role based access control (RBAC) to restrict access to connections, streams, and pipelines based on user roles. This ensures that sensitive data can be governed according to the specific needs of the organization.

Decodable has also achieved both SOC2 Type II and GDPR compliance certifications. In addition, customers can choose between two deployment modes, fully hosted or bring-your-own-cloud (BYOC). Depending on the requirements of the business, enhanced levels of internal compliance can be satisfied by running in a BYOC mode, where your data is never leaving your security perimeter.

Support

One other thing that all large, complex IT systems have in common is the need for a robust approach to providing support, and stream processing platforms are certainly no exception. We have a team of Flink and Debezium experts who proactively monitor all active pipelines and connections for our customers. As part of our service, we automatically address any service-related issues that may arise.You will be notified by either the system or our support team if service issues impact your resources on our platform.

If you suspect there’s an issue with the service, we are always available to initiate an investigation and provide support. Examples of service issues include the analysis and remediation of:

  • Bugs
  • Node failures and infrastructure degradation
  • Partial or full network interruptions
  • Identity and access control related issues
  • Internal connection, stream, and pipeline failures

While we’ve designed the service to minimize the chances of accidental errors in configuration or use, some issues are difficult to avoid and cannot be remediated by us without your assistance, such as in the case of incorrect credentials for a source database you’d like to connect to. For customers with Enterprise agreements, we are here to answer questions, refine your stream processing jobs to maximize performance and cost efficiency, and provide proactive support. Our support team will contact you if they suspect a connection or pipeline is misconfigured.

Summary

At Decodable, our goal is to help businesses more quickly and easily gain the benefits of real-time ETL and stream processing to accelerate from proof-of-concept to production-readiness—without having to (re-)build their entire infrastructure from the ground up. Rooted in open source software, our platform stands on the shoulders of giants. We have built supporting technology and services around this foundation, providing a comprehensive stream processing platform tailored to run your business-critical applications and workloads.

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Jared Breeden

Let's Get Decoding