Back
January 2, 2024
5
min read

Building a Managed Flink Service

By
Jared Breeden
Share this post

What does it take to build a production-ready managed Flink service? What should companies consider beyond the core technologies, the things that really turn the promised capabilities into actual line-of-business functionality? At Decodable, we have spent years exploring and addressing the complexities of this challenge, resulting in a comprehensive solution.

The foundation lies in utilizing open-source systems like Apache Flink, Apache Kafka, and Debezium, purpose-built for stream processing challenges. Together, they provide the systems to transform and analyze data streams, to ingest, store, and transport those streams, and to support change data capture (CDC).

Here at Decodable, we’ve built a solution that goes beyond the foundational technologies, addressing the broader requirements of real-time stream processing for ELT, ETL, and data replication. This includes ensuring a solid developer experience, providing extensive and flexible connectivity, managing schema, ensuring scalability across different workloads and use cases, providing observability, maintaining security, data governance, compliance, and offering ongoing support.

Stream Processing Developer Experience

As a fully managed Platform-as-a-Service, our platform takes care of the stream processing infrastructure and the deployment of Flink jobs so developers can focus on the business logic for their data pipelines. That means there are no servers for you to manage, no clusters to create, size, or monitor, and no software dependencies to update or maintain within our platform.

As a developer, you can choose how you want to work. Whether through the web UI, using the terminal-based CLI, programming against the REST API, or using the Decodable dbt adapter. Any of these methods can be used to configure connections to data sources and sinks, define real-time streams, and create data pipelines.

The web-based SQL editor allows users to author data pipelines, iterating quickly thanks to the built-in output preview feature, and easily deploy them. The version history of your SQL jobs is tracked automatically and can be used to roll back at any time. SQL job definitions can be stored in source control and deployed with the CLI or dbt adapter. For those preferring to work with existing Flink jobs or who want to directly interact with the Flink APIs, deploying custom jobs written in Java (or other JVM-based languages, such as Scala) is as simple as uploading the JAR file and clicking “Start.” An SDK is also provided for connecting custom jobs to other pipelines and connectors, along with support for the Flink web dashboard.

Our platform also provides transparent and robust error handling and recovery. Automatic handling of hardware and software failures in the infrastructure allows your data pipelines to auto-recover from failures and resume where they left off. This greatly reduces the complexity of the stream processing jobs, enabling developers to focus on critical business logic.

Connector Catalog: Apache Kafka, Amazon Kinesis, & More

Before processing your data streams, or sending the results to an external system, you must establish connections to your data sources and sinks. This includes handling version upgrades, quality management, performance tuning, malformed data, and dealing with append-only versus CDC data. Fortunately, we have you covered.

Our platform includes a large and growing catalog of managed connectors for getting streams from a wide variety of sources (e.g., OLTP databases, event streaming platforms, application caches) and sending them to an array of different sinks (e.g., OLAP databases, real-time analytics systems, data lakes and warehouses, object stores). Our connectors handle the complexity of connecting to external systems and reliably reading & writing data. They require a small amount of configuration to get started, such as the hostname & credentials for the target system. These connectors enable use cases including:

  • Stream processing by connecting to systems like Kafka, Redpanda, and AWS Kinesis.
  • Change data capture (via Debezium), supporting RDBMS like MySQL & PostgreSQL, and data warehouse solutions like Snowflake.
  • Simple HTTP-based event collection with the REST connector.

Source connections produce to a stream, while sink connections consume from a stream. You control the resources a connection gets by specifying the task size and count, which greatly simplifies the requirement to scale the integration between data systems.

If you need to connect to a system not yet supported by one of our built-in connectors, you can still do so with custom pipelines. Writing the connection logic yourself requires some Flink programming knowledge, but once written it can be easily deployed, scaled, and monitored like any other Decodable job.

Schema Management and Inference

Schema definition and management can be tedious and error-prone, but Decodable can help by:

  • Inferring schemas from external systems where possible. For example, by reading existing schema from Confluent schema registry.
  • Supporting automatic conversion of existing schemas for popular serialization systems like Avro or JSON schema.

Once a schema is defined, source connections can be configured to ignore non-conforming data or halt progress so the issue can be addressed without data loss. Once data has made it through a connector into your Decodable account, you can be sure it conforms to the defined schemas.

Output schemas for SQL data pipelines are automatically inferred based on query semantics within our platform. Future updates will extend this schema inference to external systems like MySQL and Snowflake. This will enable the construction of end-to-end data flows without the need to manually define schemas. It also allows you to benefit from the safety guarantees that static schemas provide at authoring time while supporting large-scale deployments with thousands of source and sink tables or more.

Scalability for Real-time Data Pipelines

You will also likely need to scale resource availability up and down for various tasks and workloads. We have abstracted this complex problem and implemented the more straight-forward concepts of task sizing and task count, defined for each connection and data pipeline.

Do you need more capacity? Pick a task size to define the maximum level of resource utilization and a task count to define the degree of parallelism—we’ll handle the rest. Our platform can scale data pipelines up or down as needed. The ease of configuring your jobs allows you flexibly to advance from small-scale proof-of-concept to full-scale production in just a few minutes, consuming only the resources you need, when you need them.

In addition, the Decodable platform monitors and optimizes the infrastructure, eliminating the need for constant tuning of memory, I/O, and timeouts.

Observability

For production environments, it is important that the control plane is instrumented for auditing, the data plane is instrumented for health, quality, & performance, and observability data is centrally available for compliance, debugging, and performance tuning.

Our platform provides performance metrics and error monitoring out of the box through our web UI, CLI, and APIs. For each account, metrics for all jobs and audit logs for all events are published to system-provided “_metrics” and “_events” streams. This data can be processed by pipelines like any other data in our platform or written to external reporting and alerting systems with our pre-built connectors, such as Datadog or Snowflake.

Security and Compliance

In the initial stages of implementing a DIY stream processing platform, security and compliance tend to take a back seat. While that can work during a proof-of-concept phase, these crucial aspects must be rock solid before organizations can move to production and run their key workloads that process business-critical data.

Designed with security and compliance in mind, our platform features Role based access control (RBAC) to restrict access to connections, streams, and pipelines based on user roles. This ensures that sensitive data can be governed according to the specific needs of the organization.

Decodable has also achieved both SOC2 Type II and GDPR compliance certifications. In addition, customers can choose between two deployment modes, fully hosted or bring-your-own-cloud (BYOC). Depending on the requirements of the business, enhanced levels of internal compliance can be satisfied by running in a BYOC mode, where your data is never leaving your security perimeter.

Support

One other thing that all large, complex IT systems have in common is the need for a robust approach to providing support, and stream processing platforms are certainly no exception. We have a team of Flink and Debezium experts who proactively monitor all active pipelines and connections for our customers. As part of our service, we automatically address any service-related issues that may arise. The system or our support team will notify you if service issues impact your resources on our platform.

If you suspect an issue with the service, we are always available to initiate an investigation and provide support. Examples of service issues include the analysis and remediation of:

  • Bugs
  • Node failures and infrastructure degradation
  • Partial or full network interruptions
  • Identity and access control related issues
  • Internal connection, stream, and pipeline failures

While we’ve designed the service to minimize the chances of accidental errors in configuration or use, some issues are difficult to avoid and cannot be remediated by us without your assistance, such as in the case of incorrect credentials for a source database you’d like to connect to. For customers with Enterprise agreements, we are here to answer questions, refine your stream processing jobs, and provide proactive support. Our support team will contact you if they suspect a connection or pipeline is misconfigured.

Summary

At Decodable, our goal is to help businesses more quickly and easily gain the benefits of real-time ETL and stream processing to accelerate from proof-of-concept to production-readiness—without having to (re-)build their entire infrastructure from the ground up. Rooted in open-source software, our platform stands on the shoulders of giants. We have built supporting technology and services around this foundation, providing a comprehensive stream processing platform tailored to run your business-critical applications and workloads.

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Jared Breeden

What does it take to build a production-ready managed Flink service? What should companies consider beyond the core technologies, the things that really turn the promised capabilities into actual line-of-business functionality? At Decodable, we have spent years exploring and addressing the complexities of this challenge, resulting in a comprehensive solution.

The foundation lies in utilizing open-source systems like Apache Flink, Apache Kafka, and Debezium, purpose-built for stream processing challenges. Together, they provide the systems to transform and analyze data streams, to ingest, store, and transport those streams, and to support change data capture (CDC).

Here at Decodable, we’ve built a solution that goes beyond the foundational technologies, addressing the broader requirements of real-time stream processing for ELT, ETL, and data replication. This includes ensuring a solid developer experience, providing extensive and flexible connectivity, managing schema, ensuring scalability across different workloads and use cases, providing observability, maintaining security, data governance, compliance, and offering ongoing support.

Stream Processing Developer Experience

As a fully managed Platform-as-a-Service, our platform takes care of the stream processing infrastructure and the deployment of Flink jobs so developers can focus on the business logic for their data pipelines. That means there are no servers for you to manage, no clusters to create, size, or monitor, and no software dependencies to update or maintain within our platform.

As a developer, you can choose how you want to work. Whether through the web UI, using the terminal-based CLI, programming against the REST API, or using the Decodable dbt adapter. Any of these methods can be used to configure connections to data sources and sinks, define real-time streams, and create data pipelines.

The web-based SQL editor allows users to author data pipelines, iterating quickly thanks to the built-in output preview feature, and easily deploy them. The version history of your SQL jobs is tracked automatically and can be used to roll back at any time. SQL job definitions can be stored in source control and deployed with the CLI or dbt adapter. For those preferring to work with existing Flink jobs or who want to directly interact with the Flink APIs, deploying custom jobs written in Java (or other JVM-based languages, such as Scala) is as simple as uploading the JAR file and clicking “Start.” An SDK is also provided for connecting custom jobs to other pipelines and connectors, along with support for the Flink web dashboard.

Our platform also provides transparent and robust error handling and recovery. Automatic handling of hardware and software failures in the infrastructure allows your data pipelines to auto-recover from failures and resume where they left off. This greatly reduces the complexity of the stream processing jobs, enabling developers to focus on critical business logic.

Connector Catalog: Apache Kafka, Amazon Kinesis, & More

Before processing your data streams, or sending the results to an external system, you must establish connections to your data sources and sinks. This includes handling version upgrades, quality management, performance tuning, malformed data, and dealing with append-only versus CDC data. Fortunately, we have you covered.

Our platform includes a large and growing catalog of managed connectors for getting streams from a wide variety of sources (e.g., OLTP databases, event streaming platforms, application caches) and sending them to an array of different sinks (e.g., OLAP databases, real-time analytics systems, data lakes and warehouses, object stores). Our connectors handle the complexity of connecting to external systems and reliably reading & writing data. They require a small amount of configuration to get started, such as the hostname & credentials for the target system. These connectors enable use cases including:

  • Stream processing by connecting to systems like Kafka, Redpanda, and AWS Kinesis.
  • Change data capture (via Debezium), supporting RDBMS like MySQL & PostgreSQL, and data warehouse solutions like Snowflake.
  • Simple HTTP-based event collection with the REST connector.

Source connections produce to a stream, while sink connections consume from a stream. You control the resources a connection gets by specifying the task size and count, which greatly simplifies the requirement to scale the integration between data systems.

If you need to connect to a system not yet supported by one of our built-in connectors, you can still do so with custom pipelines. Writing the connection logic yourself requires some Flink programming knowledge, but once written it can be easily deployed, scaled, and monitored like any other Decodable job.

Schema Management and Inference

Schema definition and management can be tedious and error-prone, but Decodable can help by:

  • Inferring schemas from external systems where possible. For example, by reading existing schema from Confluent schema registry.
  • Supporting automatic conversion of existing schemas for popular serialization systems like Avro or JSON schema.

Once a schema is defined, source connections can be configured to ignore non-conforming data or halt progress so the issue can be addressed without data loss. Once data has made it through a connector into your Decodable account, you can be sure it conforms to the defined schemas.

Output schemas for SQL data pipelines are automatically inferred based on query semantics within our platform. Future updates will extend this schema inference to external systems like MySQL and Snowflake. This will enable the construction of end-to-end data flows without the need to manually define schemas. It also allows you to benefit from the safety guarantees that static schemas provide at authoring time while supporting large-scale deployments with thousands of source and sink tables or more.

Scalability for Real-time Data Pipelines

You will also likely need to scale resource availability up and down for various tasks and workloads. We have abstracted this complex problem and implemented the more straight-forward concepts of task sizing and task count, defined for each connection and data pipeline.

Do you need more capacity? Pick a task size to define the maximum level of resource utilization and a task count to define the degree of parallelism—we’ll handle the rest. Our platform can scale data pipelines up or down as needed. The ease of configuring your jobs allows you flexibly to advance from small-scale proof-of-concept to full-scale production in just a few minutes, consuming only the resources you need, when you need them.

In addition, the Decodable platform monitors and optimizes the infrastructure, eliminating the need for constant tuning of memory, I/O, and timeouts.

Observability

For production environments, it is important that the control plane is instrumented for auditing, the data plane is instrumented for health, quality, & performance, and observability data is centrally available for compliance, debugging, and performance tuning.

Our platform provides performance metrics and error monitoring out of the box through our web UI, CLI, and APIs. For each account, metrics for all jobs and audit logs for all events are published to system-provided “_metrics” and “_events” streams. This data can be processed by pipelines like any other data in our platform or written to external reporting and alerting systems with our pre-built connectors, such as Datadog or Snowflake.

Security and Compliance

In the initial stages of implementing a DIY stream processing platform, security and compliance tend to take a back seat. While that can work during a proof-of-concept phase, these crucial aspects must be rock solid before organizations can move to production and run their key workloads that process business-critical data.

Designed with security and compliance in mind, our platform features Role based access control (RBAC) to restrict access to connections, streams, and pipelines based on user roles. This ensures that sensitive data can be governed according to the specific needs of the organization.

Decodable has also achieved both SOC2 Type II and GDPR compliance certifications. In addition, customers can choose between two deployment modes, fully hosted or bring-your-own-cloud (BYOC). Depending on the requirements of the business, enhanced levels of internal compliance can be satisfied by running in a BYOC mode, where your data is never leaving your security perimeter.

Support

One other thing that all large, complex IT systems have in common is the need for a robust approach to providing support, and stream processing platforms are certainly no exception. We have a team of Flink and Debezium experts who proactively monitor all active pipelines and connections for our customers. As part of our service, we automatically address any service-related issues that may arise. The system or our support team will notify you if service issues impact your resources on our platform.

If you suspect an issue with the service, we are always available to initiate an investigation and provide support. Examples of service issues include the analysis and remediation of:

  • Bugs
  • Node failures and infrastructure degradation
  • Partial or full network interruptions
  • Identity and access control related issues
  • Internal connection, stream, and pipeline failures

While we’ve designed the service to minimize the chances of accidental errors in configuration or use, some issues are difficult to avoid and cannot be remediated by us without your assistance, such as in the case of incorrect credentials for a source database you’d like to connect to. For customers with Enterprise agreements, we are here to answer questions, refine your stream processing jobs, and provide proactive support. Our support team will contact you if they suspect a connection or pipeline is misconfigured.

Summary

At Decodable, our goal is to help businesses more quickly and easily gain the benefits of real-time ETL and stream processing to accelerate from proof-of-concept to production-readiness—without having to (re-)build their entire infrastructure from the ground up. Rooted in open-source software, our platform stands on the shoulders of giants. We have built supporting technology and services around this foundation, providing a comprehensive stream processing platform tailored to run your business-critical applications and workloads.

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

Jared Breeden