January 2, 2024

min read

Building a Managed Flink Service

Share this post

What does it take to build a production-ready managed Flink service? What should companies consider beyond the core technologies, the things that really turn the promised capabilities into actual line-of-business functionality? At Decodable, we have spent years exploring and addressing the complexities of this challenge, resulting in a comprehensive solution.

The foundation lies in utilizing open-source systems like Apache Flink, Apache Kafka, and Debezium, purpose-built for stream processing challenges. Together, they provide the systems to transform and analyze data streams, to ingest, store, and transport those streams, and to support change data capture (CDC).

Here at Decodable, we’ve built a solution that goes beyond the foundational technologies, addressing the broader requirements of real-time stream processing for ELT, ETL, and data replication. This includes ensuring a solid developer experience, providing extensive and flexible connectivity, managing schema, ensuring scalability across different workloads and use cases, providing observability, maintaining security, data governance, compliance, and offering ongoing support.

Flexible Deployment Options

The Decodable platform has been designed to provide real-time stream processing for companies of all sizes and levels of technical complexity. By providing flexible options for how the service is deployed, we are able to seamlessly integrate into your existing data tech stack. Like most data platforms, Decodable is composed of two operational planes, the control and data planes, which can run independently.

The control plane provides APIs to configure, manage, and monitor Decodable resources. These services are multi-tenant, and don’t touch customer data for processing. In the case of a partial or total failure of the control plane, the data plane continues to operate without interruption.

The data plane is responsible for the actual data processing runtime with job-level isolation to ensure performance and security. Each pipeline is run as a separate Flink cluster. In case of a failure, the downtime and recovery won’t affect any other pipelines. Services in the data plane regularly report status back to the control plane for monitoring purposes.

Fully Hosted

For users who prioritize ease-of-use, performance, and cost, a fully hosted deployment provides the shortest path to production with Decodable running both the control and data planes.

Ease: Managing infrastructure is complex. Some users prefer a solution requiring no additional setup.
Performance: Latency is reduced when data moves within a single hosted environment. Ideally, processing should occur close to data sources and sinks.
Cost: Using a single cloud provider can reduce data transfer and egress costs.

The fully hosted deployment option offers zero infrastructure deployment and flexibility. It is SOC 2 Type II and GDPR compliant, ensuring data privacy and security, and operates in the same AWS regions as your services.

Bring Your Own Cloud (BYOC)

For companies with strict data privacy and sovereignty needs, cloud provider commitments, or latency-sensitive use cases, BYOC is the best option. For BYOC accounts, the data plane runs in your cloud account, in your VPC, on infrastructure managed by you, while the control plane runs within the Decodable VPC environment. This separation means that your data does not leave your VPC, and Decodable employees do not have access to your data.

The control plane is only responsible for account management, the data catalog, SQL parsing and analysis, access control policy definition, and other metadata about the account. This information allows Decodable to control the lifecycle of connections and pipelines, manage dependencies, and understand the status of processing that occurs in the data plane. Account identity and authentication occurs in the control plane.

Connections and pipelines started by the control plane execute entirely in the data plane. All network connectivity between these jobs and your data infrastructure is under your control and need not traverse the public internet. These jobs report status information back to the control plane including job state transitions (e.g., stopped to starting to running), aggregate health metrics (e.g. bytes in / out, records in / out, lag), any error messages, and similar information, but never your data. All connection and pipeline secrets (e.g. connection credentials) are stored and managed within the data plane. The data plane exposes an API that must be accessible to clients, but not the control plane or other traffic.

Sovereignty: BYOC allows running a private instance of the data plane within your AWS account. All data stays in your network while still benefiting from a managed service.
Privacy: Many companies are concerned about vendors accessing sensitive data. Stream processing systems often handle critical data and may anonymize it.

Connector Catalog

Before processing your data streams, or sending the results to an external system, you must first establish connections to your data sources and sinks. This includes handling version upgrades, quality management, performance tuning, handling malformed data, and dealing with append-only versus CDC data. Fortunately, we have you covered.

Our platform includes a large and growing catalog of managed connectors for getting streams from a wide variety of sources (e.g., OLTP databases, event streaming platforms such as Apache Kafka and Amazon Kinesis, and application caches) and sending them to an array of different sinks (e.g., OLAP databases, real-time analytics systems, data lakes and warehouses, object stores).

Our connectors enable use cases such as:

Stream processing and ETL by connecting to systems like Kafka, Redpanda, and Kinesis.
Change data capture (via Debezium), supporting RDBMS like MySQL & PostgreSQL, and data warehouse solutions like Snowflake.
Simple HTTP-based event collection with the REST connector.

Decodable's connectors handle the complexity of reliably connecting to external systems, requiring only minimal configuration to get started. Some of the key benefits include:

Fully integrated dependencies: All connector dependencies are pre-integrated and version-compatible, avoiding runtime dependency conflicts. Decodable abstracts away the challenge of stitching together and maintaining complex connector libraries.
Validated connectivity: Decodable performs upfront connectivity testing and input validation on connector configurations. This provides insulation from esoteric connection errors and ensures connectors run smoothly from the start.
Automated type translation: The platform can automatically handle type translation between source systems, Decodable's internal types, and destination systems. This eliminates manual data type mapping.
Optimized open-source connectors: Decodable has patched and tuned open-source connectors to fix issues, add missing features, and optimize settings for Decodable's environment, including hardening some of the less popular connectors for production use.
Custom-built connectors: For critical systems lacking existing connectors, like Snowflake, Decodable has built robust connectors from the ground up. These are operated with the same reliability as any other connector.

If you need to connect to a system that is not yet supported by one of our built-in connectors, you can still do so with custom pipelines. Once written, your connectors can be deployed, scaled, and monitored like any other Decodable job.

Schema Evolution and Inference

Another key consideration for stream processing systems is how to unify and translate data types across connected systems. Schema definition and management can be tedious and error-prone, but we solve this by:

Inferring schemas from external systems where possible. For example, by reading existing schema from Confluent schema registry.
Supporting automatic conversion of existing schemas for popular serialization systems like Avro or JSON schema.

Output schemas for SQL data pipelines are automatically inferred based on query semantics within Decodable. It also allows you to benefit from the safety guarantees that static schemas provide at authoring time, while also supporting large-scale deployments with thousands of source and sink tables or more.

Connectors, especially those implementing change data capture (CDC), are evolving to dynamically handle schema updates. This is a development that the team at Decodable is participating in and actively integrating into our platform. As this functionality continues to mature, our connectors will be updated to offer even better support for schema evolution.

Developer Experience

As a fully-managed service, our platform takes care of the stream processing infrastructure and the deployment of Flink jobs, so developers can focus on the business logic for their data pipelines. That means there are no servers for you to manage, no clusters to create, size, or monitor, and no software dependencies to update or maintain within our platform.

As a developer, you can choose how you want to work. Whether it’s through the web UI, using the terminal-based CLI, programming against the REST API, or even using the Decodable dbt adapter. Any one, or a combination of several, of these methods can be used to configure connections to data sources and sinks, define real-time streams, and create data pipelines.

SQL, Java, and Python on Decodable

Decodable provides full support for SQL, allowing data experts to use their preferred language. Decodable's Flink SQL supports both simple operations and advanced features such as sophisticated joins, complex event processing (CEP) with match_recognize, analytic window functions, and tumbling/hopping window aggregations among many others.

‍

For Flink experts or those needing more flexibility, Decodable also supports jobs written in Java or Python. While SQL offers many advantages, including query optimization, its declarative syntax can be limiting in some scenarios. For example, invoking a service within a stream processing job is challenging with SQL alone. Java or Python is often better suited for applications like online analytics, or real-time inventory management.

To further enhance flexibility, Decodable has introduced an SDK that provides access to Decodable's runtime, allowing users to:

Utilize built-in connectors;
Access managed streams;
Expose lineage information from custom jobs.

By supporting both SQL and custom code, Decodable offers a comprehensive solution that caters to various skill levels and use cases in stream processing.

public static void main(String[] args) throws Exception {
  StreamExecutionEnvironment env =
    StreamExecutionEnvironment.getExecutionEnvironment();
 
  DecodableStreamSource<PurchaseOrder> source =
    DecodableStreamSource.<PurchaseOrder>builder()
      .withStreamName("purchase-orders")
      .withDeserializationSchema(
        new JsonDeserializationSchema<>(PurchaseOrder.class))
      .build();
 
  DecodableStreamSink<PurchaseOrder> sink =
    DecodableStreamSink.<PurchaseOrder>builder()
      .withStreamName("purchase-orders-processed")
      .withSerializationSchema(new JsonSerializationSchema<>())
      .build();
 
  DataStream<PurchaseOrder> stream = env.fromSource(
    source
    WatermarkStrategy.noWatermarks(), "Purchase Orders Source")
      .map(new PurchaseOrderProcessor());
  stream.sinkTo(sink);
  env.execute("Purchase Order Processor");
}

Our platform also provides transparent and robust error handling and recovery. Automatic handling of hardware and software failures in the infrastructure allows your data pipelines to auto-recover from failures and resume where they left off. This greatly reduces the complexity of the stream processing jobs, enabling developers to focus on critical business logic.

Complete Developer Experience

When building a production-ready streaming application, it’s not just as simple as writing the code to implement your business logic. It is also necessary to:

Catalog and manage data streams, jobs, and their resources.
Support dependency analysis and track job lineage for debugging.
Define and manage schemas, ensuring compatibility with source and sink schemas.
Implement job testing and debugging using common Software Development Life Cycle (SDLC) tools like Git, CI/CD automation, and unit testing.
Build processes and tools to transition from local development to a staging environment.
Control job lifecycle, beyond the regular Kubernetes orchestration capabilities to control Streaming application specifics like checking checkpoint states.
Create a unified API layer for easy automation and control.
Build libraries of tools and processes to accelerate common patterns.

Decodable handles these challenges for you. With Decodable, you can rapidly prototype and iterate stream processing pipelines and deploy them to production via an optimized developer experience that provides an intuitive and smooth workflow. The platform includes a comprehensive API and scriptable CLI for automation and integration with existing GitOps tools and processes.

As a developer, you can choose how you want to work. Whether it’s through the web UI, using the terminal-based CLI, programming against the REST API, or using the Decodable dbt adapter. These methods can be used to configure connections to data sources and sinks, define real-time streams, and create data pipelines.

The web-based SQL editor allows users to author data pipelines, iterate quickly with the built-in output preview feature, and easily deploy them. The version history of your SQL jobs is tracked automatically and can be used to roll back at any time. SQL job definitions can be stored in source control and deployed with the CLI or dbt adapter. For those preferring to work with existing Flink jobs or who want to directly interact with the Flink APIs, deploying custom jobs written in Java (or other JVM-based languages, such as Scala) is as simple as uploading the JAR file and clicking “Start.” An SDK is also provided for connecting custom jobs to other pipelines and connectors, along with support for the Flink web dashboard.

Simplified Scalability

Closely related to the requirement of managing the infrastructure of a stream processing platform is the need to scale resource availability up and down for various tasks and workloads. We have abstracted this complex problem and implemented the simpler concepts of task sizing and task count, which are defined for each connection and data pipeline. Do you need more capacity? Pick a task size to define the maximum level of resource utilization and a task count to define the degree of parallelism—we’ll handle the rest.

Task Size: Defines the maximum resource utilization per task, allowing jobs to be scaled up.
Task Count: Specifies the degree of parallelism, or how many tasks can run concurrently, allowing jobs to be scaled out.

Our platform can scale data pipelines up or down as needed. The ease and flexibility of configuring the scalability of your jobs allows you to advance from small-scale proof-of-concept to full-scale production in just a few minutes, consuming only the resources you need, when you need them.

In addition, the Decodable platform monitors and optimizes the infrastructure, eliminating the need for constant tuning of memory, I/O, and timeouts.

Observability

Once your Flink applications are in production, the next critical step is monitoring and observing their behavior and performance. This involves instrumenting the control plane for auditing, the data plane for health, quality, and performance metrics, and centralizing observability data for compliance, debugging, and performance tuning.

Decodable helps with building observability into your streaming applications by:

Providing performance metrics and error monitoring out of the box through our web UI, CLI, and APIs
Publishing metrics for all jobs and audit logs for all events to system-provided "_metrics" and "_events" streams. Depending on the needs of your specific use cases, this data can also be processed by pipelines just like any other data stream in our platform.
Making the metric and event streams available to external reporting and alerting systems with our pre-built connectors, such as Datadog or Snowflake, gives you flexibility in your observability stack.
Allowing you to configure your observability tools to set up alarms and handle failures based on the streamed metrics and events.

Security and Compliance

Securing your data platform is a top priority given its central role in running your business-critical streaming applications. Robust security measures protect data integrity, especially for those that interface directly with external data sources. In addition, because these apps often process sensitive information, they require protection against unauthorized access to maintain user trust and comply with applicable regulations, such as GDPR and HIPAA.

Decodable was designed with security and compliance at its core. Decodable provides RBAC to restrict access to connections, streams, and pipelines based on user roles. This ensures that sensitive data is governed according to the specific needs of your organization. Additionally, Decodable features several key security capabilities:

Secure authentication and secrets management
Connectors must authenticate to external data sources for Decodable to access that data. Instead of directly entering your credentials into Decodable, you can use Decodable secrets to securely store your credentials and reference them when configuring a connection. These secrets can be managed using our web interface or the CLI, ensuring that sensitive information is handled securely.

Comprehensive monitoring and auditing
Tracking changes made to your data platform is critical for maintaining security. Decodable generates detailed data about the health and state of its resources, which can be used to monitor user activity. Events generated by Decodable include:

Audit Events: CRUD operations on your resources (connections, streams, and pipelines).
State Transitions: Changes in the status of active resources.
Error Reporting: Errors encountered by your active resources.

Enhanced internal compliance through BYOC deployment
Customers can choose between two deployment modes: fully hosted or bring-your-own-cloud (BYOC). Depending on your business requirements, enhanced levels of internal compliance can be satisfied by running in BYOC mode, where your data never leaves your security perimeter.

Automatic security patches
Decodable's development team diligently monitors all managed connectors for updates, prioritizing rapid implementation of security patches. This proactive approach ensures that users benefit from the latest security enhancements without delay.

Compliance Certifications
Decodable has achieved both SOC2 Type II and GDPR compliance certifications, demonstrating our commitment to maintaining high security and privacy standards.

In addition, customers can choose between two deployment modes, fully hosted or bring-your-own-cloud (BYOC). Depending on the requirements of the business, enhanced levels of internal compliance can be satisfied by running in a BYOC mode, where your data never leaves your security perimeter.

Support

Decodable provides proactive monitoring and support from a team of Flink experts. Our managed platform handles version upgrades transparently, giving users access to the latest features and improvements.

As part of the service, Decodable automatically addresses any service-related issues that may arise. You will be notified by either the system or our support team if service issues impact your resources on our platform.

If you suspect there’s an issue with the service, we are always available to initiate an investigation and provide support. Examples of service issues include the analysis and remediation of:

Bugs
Node failures and infrastructure degradation
Partial or full network interruptions
Identity and access control related issues
Internal connection, stream, and pipeline failures

While we’ve designed the service to minimize the chances of accidental errors in configuration or use, some issues are difficult to avoid and cannot be remediated by us without your assistance, such as in the case of incorrect credentials for a source database you’d like to connect to. For customers with Enterprise agreements, we are here to answer questions, refine your stream processing jobs to maximize performance and cost efficiency, and provide proactive support. Our support team will contact you if they suspect a connection or pipeline is misconfigured.

Automation and Operational Excellence

Once your pipelines are in production, the final critical step is maintaining them over time. This involves managing costs, handling operational overhead, ensuring scalability, managing state, recovering from failures, staying up-to-date with Flink versions, and providing ongoing support.

Decodable's fully-managed platform is designed for operational excellence and zero-operations, allowing users to focus on their core business logic:

Managed connectors: Decodable offers a growing catalog of fully-managed source and sink connectors, including CDC connectors. These connectors are pre-configured for optimal performance and maintained by Decodable's experts, ensuring easy onboarding, self-healing, and graceful failures. This proactive approach to connector management significantly reduces long-term maintenance challenges.
Simplified scaling: Decodable abstracts the complexity of resource allocation and parallelism with the concepts of task sizing and task count. Users simply specify these settings, and Decodable dynamically configures the workload, adjusting resources, memory, I/O, and timeouts for optimal performance without manual changes to Flink configuration settings.
Built-in fault tolerance and recovery: Decodable provides built-in fault tolerance and recovery mechanisms. It automatically manages savepoints, ensuring data consistency and enabling seamless job recovery in case of failures. Jobs are automatically restarted from the latest savepoint, minimizing data loss, downtime and easing version upgrades.
Managed infrastructure: Decodable completely manages the Flink infrastructure, including cluster provisioning, configuration, and optimization. Users do not need to worry about the underlying complexities.
Transparent pricing: Decodable offers a transparent, consumption-based pricing model with no hidden costs. Users pay only for what they use, without the need to manually optimize resource utilization.
Proactive support: Decodable's team of Flink experts proactively monitors and supports the platform, identifying and resolving issues before they impact users. This reduces the need for in-house Flink expertise.
Automated operations: Decodable automates critical tasks such as scaling, state management, and failure recovery. Jobs are automatically restarted from the latest checkpoint or savepoint, ensuring data consistency and minimal downtime.
Seamless upgrades: Decodable handles Flink version upgrades transparently, ensuring users always have access to the latest features and improvements without any manual intervention.

By providing a fully-managed, zero-operations platform with built-in fault tolerance and recovery, Decodable enables organizations to achieve operational excellence in their stream processing pipelines. Teams can focus on building and deploying real-time applications without the burden of infrastructure management, manual recovery processes, and version compatibility issues.

Summary

At Decodable, our goal is to help businesses more quickly and easily gain the benefits of real-time ETL and stream processing to accelerate from proof-of-concept to production-readiness—without having to (re-)build their entire infrastructure from the ground up. Rooted in open-source software, our platform stands on the shoulders of giants. We have built supporting technology and services around this foundation, providing a comprehensive stream processing platform tailored to run your business-critical applications and workloads.

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!

Oops! Something went wrong while submitting the form.

Jared Breeden

October 19, 2023

min read

Powered by Apache Flink and Debezium, Decodable is a real-time data platform that unifies ELT, ETL, and stream processing.

Start Free Talk To An Expert

Heading 2