Blog
Blog /

dbt and Decodable - Batch and Stream Transformation Together at Last

Joel McKelvey
Decodable

Analytics engineers recognize the value of managing data transformation in a consistent, unified, trackable, testable manner. Keeping track of transformations that occur throughout the data stack, and the SQL that drives them, can be complex. This complexity often results in duplicated effort, confusion, and – worse – bad data that breaks downstream workflows. The open source project dbt has emerged as a key tool in helping bring order to what have been difficult, chaotic transformation processes.

How does dbt work?

dbt was originally designed for the batch transformation that occurs once data is loaded into a data warehouse or data lake. To date, the vast majority of transformations managed by dbt are batch, transforming data-at-rest. However, as data teams increasingly embrace the Kappa architecture to simplify their data stacks (and reduce costs), real-time stream processing is becoming the norm. Transforming data-in-flight at scale, in a way that is simple, reliable, and consistent, is why Decodable was created. Decodable’s dbt adapter integrates streaming real-time data transformation, powered by Apache Flink®, and batch transformation into a single environment in dbt. 

Decodable’s adapter is available today, designed to let you manage Flink-powered streaming data transformation the way you already manage batch transformations. Our adapter makes Decodable the first managed streaming platform with dbt support.  Importantly, Decodable is also releasing our dbt adapter as open source, available for contributions and feedback from the community. Documentation and source for the dbt-decodable project are available on GitHub, in the dbt documentation, and on the Decodable website.

Why dbt for Flink-powered transformation?

Instead of developing and maintaining Flink SQL in an unmanaged collection of files, dbt introduces a workflow based on the concept of storing queries and data assumptions in model files. These files can then be managed via git or another version control system. This unlocks several interesting features, including:

  • Enables easily cloning an entire project with hundreds of queries and testing them with dbt run.
  • Facilitates cross-team collaboration on projects by leveraging the capabilities of version control, including making changes via pull requests or working on experimental branches.
  • Provides the ability to define assumptions about the data, which makes it safer to implement and validate changes.
  • Allows teams to run dbt in their CI/CD workflows for automated deployments.

While dbt was designed primarily for traditional batch processing, it works very well with streaming SQL. As an example, a team could be working in a Decodable development account on a number of streaming SQL queries. Using the dbt adapter, they can share and collaborate on queries and stream definitions. Once these have been developed and tested, dbt can be used to quickly and easily apply them to a production account.

Getting started with Decodable and dbt

If you’re ready to try the Decodable dbt adapter, you can navigate to the open source project dbt-decodable, available on GitHub. You’ll also need Decodable, available as a free trial account, and, of course, dbt itself (pip install dbt-decodable will also pull dbt into your environment). To install the latest version of the Decodable adapter via pip (optionally using a virtual environment), run:

You can find additional information and documentation on GitHub. The readme and related documentation can help you configure appropriate dbt profiles, understand the adapter’s currently supported features, and more. If you’d like a tour of Decodable with one of our streaming experts, you can always request a personalized 1:1 demo of the product and see how simple we’ve made it to take advantage of real-time data streaming.

A Practical Introduction to the Data Mesh

There’s been quite a bit of talk about data meshes recently, both in terms of philosophy and technology. Unfortunately, most of the writing on the subject is thick with buzzwords, targeted toward VP and C-level executives, unparsable to engineers. The motivation behind the data mesh, however, is not only sound but practical and intuitive.

Learn more

Decodable Platform Overview

I joined Decodable about a year ago as one of the founding engineers. One year on, we’ve built a self-service real-time data platform supporting hundreds of accounts in multiple regions. In this blog post, I’ll share why we built Decodable, the design decisions we made, and the platform architecture.

Learn more

Real Time Streaming Joins With SQL

Decodable enables continuous SQL joins across streaming sources including streaming systems like Kafka, Pulsar and Kinesis as well as databases with Change Data Capture (CDC). This is exactly the use-case for real-time joins. In this blog we'll explore how this works, and how it helps bring siloed data into play in low-latency streaming applications.

Learn more

Heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Learn more
Tags
Pintrest icon in black
announcements
Company News
Feature
Partners
Open Source
Product
dbt

Start using Decodable today.