🧪 Virtual Hands-On Lab: Introduction to Real-time ETL

September 26, 2023

min read

Stream Processing Everywhere, for Everyone

Share this post

Businesses are using stream processing to make smarter and faster decisions by acting on time-sensitive and mission-critical events, performing real-time analytics, and building applications with features delivered to end-users in real time. While the spectrum of use cases continues to expand across an ever-widening range of domains, common applications of stream processing include fraud detection, real-time transformation and ingestion of application events, processing IoT sensor data, network monitoring, generating context-aware online advertising, cybersecurity analysis, geofencing and vehicle tracking, and many others.

With all of the excitement around real-time data and what it helps us do, teams still get tripped up on practical issues. We’re fixing that.

Our Job at Decodable: Real-Time ETL, ELT, and Data Replication

We’ve taken the best-of-breed open source projects, including Apache Flink and Debezium, and built a powerful, fully-managed stream processing platform you can run in production for real-time ETL, ELT, and data replication. This goes beyond just spinning up Flink clusters (although that’s part of it). We provide a simple, easy-to-use developer experience using SQL or Java. Operated and maintained by a team that lives and breathes stream processing, Decodable works with your existing stack, so you can build real-time data pipelines without worrying about stitching together the lower-level building blocks.

Throughout the last year, we’ve worked with teams big and small, cloud-native and cloud-curious, from startups to enterprises. You’ve told us what’s important to you and we listened. We’re announcing two major improvements to Decodable that we think you’ll love.

Bring Your Own Cloud (Or Don’t)

We’ve talked to a bunch of customers who want the simplicity, efficiency, and support of managed cloud services, but get stuck on a few key issues.

Data privacy and sovereignty. Many companies struggle with the risk of a vendor having access to critical sensitive data. Stream processing systems often process exactly this data, and might even be the systems responsible for anonymizing it as part of transformation and ingestion.
Cost. Working with a cloud service provider can mean transmitting high volumes of data outside of your cloud account incurring transfer and egress costs. Depending on your network architecture, this can also require additional networking infrastructure and services to handle the increased load. Many customers also have committed spend with their cloud provider of choice and prefer to burn down that commitment.
Performance. Latency increases when moving data through multiple network services between data sources, a stream processing service, and sinks. Ideally, processing should occur close to sources and sinks.

BYOC

We’re incredibly proud to announce support for running a private instance of the Decodable data plane—the part of Decodable that handles connectivity and stream processing—in your AWS account. All of your data stays within your network and runs on resources owned by you, while you still receive all the benefits of a managed cloud stream processing service. We call this Bring Your Own Cloud, or BYOC for short. BYOC is generally available today. BYOC is perfect for enterprise customers with strict data privacy and security requirements, cloud service provider commitments, or for use cases where every millisecond counts.

Bring Your Own Cloud diagram — *Figure 1. Bring Your Own Cloud Architecture*

‍

Managed Cloud

All of that said, Decodable is SOC 2 Type II and GDPR compliant, runs in the same AWS regions as you, and sometimes you really just want the shortest path to production. For those that prefer to get up and running with no additional infrastructure at all, our fully managed offering is a perfect solution for this. Fully managed accounts benefit from zero infrastructure deployment and a ton of flexibility.

‍

Stream Processing and ETL with Java or SQL? Yes!

Here at Decodable, we have long believed that Apache Flink is the most robust stream processing system, with a proven track record of meeting the demands of some of the largest and most sophisticated businesses in the world, such as Netflix, Uber, Stripe, and many more. Those demands range from simple routing, transformation, and filtering use cases, all the way to complex feature engineering and AI workloads. The challenge for many is that writing a full Java application for filtering records is overkill, while some workloads are difficult to express in SQL.

While we’ve supported Flink SQL from day one here at Decodable, we’ve opened up a technical preview to support custom jobs written in Java with the standard Apache Flink DataStream and Table APIs.

Flink SQL

In many cases, it is possible to get to production faster with Flink SQL, a language that is familiar to a wide range of data engineers and data scientists. One of the primary benefits of writing jobs in SQL is that it automatically translates and optimizes the SQL to create an efficient runtime plan which is then executed by the Flink engine. This results in a high level of performance and efficiency, without requiring expertise in using imperative programming languages, serialization formats, low level operator implementation, manual state management, and memory management. Flink SQL supports the simple stuff you expect, but also sophisticated joins, complex event processing (CEP) with match_recognize, analytic window functions, tumbling and hopping window aggregation, and more.

Custom Apache Flink Jobs

If you’re already a Flink expert or just desire more flexibility, Decodable makes it easy to build and run Flink jobs written in Java as well. Custom pipelines complement Decodable’s support for SQL-based data streaming pipelines. While SQL is a great choice for a large share of data streaming use cases—allowing you to map and project, filter and join, group and aggregate your data—in some cases it might not be flexible enough.

Based on the powerful Java-based APIs of Apache Flink, custom pipelines allow you to implement arbitrary custom logic, invoke external web services, integrate 3rd-party connectors which aren’t natively supported by Decodable, and much more, into your data flows. Just like SQL pipelines, you do not need to provision, scale, or tune any infrastructure to leverage custom jobs. We’ve gone one step further and introduced an SDK that provides access and hooks into Decodable’s runtime so you can still access the built-in connectors, streams, and even expose lineage information from your jobs.

Conclusion

We’re all about stream processing here at Decodable, and stream processing is here today to meet your needs for ETL, ELT, and data replication. BYOC extends cloud-native, managed stream processing to teams who have not historically been able to use these services. The addition of custom pipelines means you can have a single managed platform that handles all of your real-time use cases. Our job is to bring you the most feature-rich, intuitive, and powerful stream processing platform in the cloud. This is one more step (ok, two) in that direction.

And we’re not even close to done. Without spilling the beans, we’ve got some really exciting new features in the works that will let you do even more with less (in a good way).

To learn more about Decodable, talk to a stream processing expert, or get a demo, give us a shout.

Prefer to get hands on? We like your style. Create a free Decodable account now.

‍

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!

Oops! Something went wrong while submitting the form.

Eric Sammer

Eric Sammer is a data analytics industry veteran who has started two companies, Rocana (acquired by Splunk in 2017), and Decodable. He is an author, engineer, and leader on a mission to help companies move and transform data to achieve new and useful business results. Eric is a speaker on topics including data engineering, ML/AI, real-time data processing, entrepreneurship, and open source. He has spoken at events including the RTA Summit and Current, on podcasts with Software Engineering Daily and Sam Ramji, and has appeared in various industry publications.

Apache Flink is the Industry Standard and It’s Really Obvious

May 24, 2023

min read

Apache Flink is the Industry Standard and It’s Really Obvious

Eric Sammer

Introducing the Decodable SDK for Custom Pipelines

July 11, 2023

min read

Introducing the Decodable SDK for Custom Pipelines

Gunnar Morling

Powered by Apache Flink and Debezium, Decodable is a real-time data platform that unifies ELT, ETL, and stream processing.

Start Free Talk To An Expert

Heading 2

With all of the excitement around real-time data and what it helps us do, teams still get tripped up on practical issues. We’re fixing that.

Our Job at Decodable: Real-Time ETL, ELT, and Data Replication

Bring Your Own Cloud (Or Don’t)

We’ve talked to a bunch of customers who want the simplicity, efficiency, and support of managed cloud services, but get stuck on a few key issues.

Data privacy and sovereignty. Many companies struggle with the risk of a vendor having access to critical sensitive data. Stream processing systems often process exactly this data, and might even be the systems responsible for anonymizing it as part of transformation and ingestion.
Cost. Working with a cloud service provider can mean transmitting high volumes of data outside of your cloud account incurring transfer and egress costs. Depending on your network architecture, this can also require additional networking infrastructure and services to handle the increased load. Many customers also have committed spend with their cloud provider of choice and prefer to burn down that commitment.
Performance. Latency increases when moving data through multiple network services between data sources, a stream processing service, and sinks. Ideally, processing should occur close to sources and sinks.