Blog /

Publishing Data Products with AsyncAPI

Hubert Dulay

What are data products?

Data products are like produce in a grocery store. When shopping for produce, I want assurance that the product is clean and not moldy. Also that it’s wrapped and protected from dirt or from kids' germy little fingers. I may look for the organic label so that I can trust the product and so I don’t have any anxiety in consuming or serving it to others. This idea is not any different than data products.

Basically data products are data that has high quality and trustability. They provide the confidence to consumers that what they are consuming is correct, secured, and will not break their applications.

In AI/ML, data scientists will want to use data products because they provide the confidence that the data they are consuming will provide proper insights to the questions their analytics are trying to solve. Especially when these analytics are related to medical decisions, trusting data is critical. Data products provide this trust.

Decodable packages data sources into ready-to-consume data products

Why data products?

When data engineers think of ETL, we often think of building a batch data pipeline from source to sink with all the required transformations in between. It’s usually requested by data scientists or other LOB stakeholders that just need your data. A lot of the time you’re not even exposed to the end use case (or worse, the SLAs the use case requires). So you’re left to guess the incremental cadence to configure your Airflow DAG. But what if you could just build data products? You wouldn’t actually need to worry about the sink. You would just provide the self-service tools that will give your data customers the ability to consume the data into their own domain.

And what if you provide these data products as real-time streaming data? You wouldn’t have to worry about building and scheduling your Airflow DAG. It would be a continuous real-time feed that would meet most required SLAs. The only part data engineers would be responsible for is sourcing the data and transforming it so that it meets the high quality and trust your data consumers require. You would only do this work once before publishing it for others to consume.

Decodable enables easy publishing of data products as streams. We also enable subscribers to easily consume and bring that streaming data into their own domain.

In the following use case (see github repo), we’ll start by generating some mock input data. We’ll parse and transform it into a format generalized for consumers and we’ll assign a schema to it. After the data is parsed, cleaned, and formatted, we can consider it a data product.

Streaming data products - as easy as REST APIs?

AsyncAPI is an open source initiative with the goal of making streaming architectures as easy and as common as REST APIs.

AsyncAPI provides a standard way of describing asynchronous data (streaming data) in a way that extends OpenAPI that describes REST APIs. The goal is to make streaming architectures as easy as REST APIs. You can extend AsyncAPI to add self-service capabilities that will enable easy integration and consumption of data products published in Decodable.

With AsyncAPI tools, developers can generate and parse an AsyncAPI YAML document to generate client code like Spring Boot or even HTML. You can also parse AsyncAPI to call REST endpoints in Decodable to create sink connections to pull data products into your domain. The tools paired with Decodable makes for easy, low to no-code experience when consuming your data products.

To try this for yourself, check out the samples in the Decodable repository.

Watch me demo AsyncAPI with Decodable from our recent Demo Day:

You can get started with Decodable for free - our developer account includes enough for you to build a useful pipeline and - unlike a trial - it never expires.

Learn more:

Demo Day: Fraud Detection using SQL for ML Feature Extraction

In this SQL-packed demo, see how Moonsense uses Decodable SQL tranformations in multiple pipelines to convert streaming device data into features to populate a fraud detection machine learning model.

Learn more

The Top 5 Streaming ETL Patterns

ETL and ELT are traditionally scheduled batch operations, but as the need for always-on, always-current data services becomes the norm, realtime ELT operating on streams of data is the goal of many organizations - if not the reality, yet.In real world usage, the ‘T’ in ETL represents a wide range of patterns assembled from primitive operations. In this blog we’ll explore these operations and see examples of how they’re implemented as SQL statements.

Learn more

Flink Deployments At Decodable

Decodable’s platform uses Apache Flink to run our customers’ real-time data processing jobs. This blog post explores how we securely, reliably and efficiently manage the underlying Flink deployments at Decodable in a multi-tenant environment.

Learn more


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Learn more
Pintrest icon in black

Start using Decodable today.