With nearly 60% of the global population connected online, we are truly living in a data-driven economy. Even so, many businesses struggle to capture most of the data generated by websites, mobile apps, social media, and other digital channels every day. Customer Data Platform (CDP) embodies the process of collecting and using data from the users of your digital properties, including profile data and real-time interaction data (behavioral, demographics, transactional), data from marketing campaigns, customer support, point of sale, IoT, and many more. By creating a comprehensive view of potential customers, businesses can generate more insightful customer analytics, identify new audience segments, and optimize marketing campaigns.
In this example, we’ll walk through how the Decodable data service is used to clean, transform, and aggregate data from multiple data sources.
Data streams that feed Customer Data Platforms come in many forms from many sources, including call logs, clickstream data, ecommerce activity, geolocation, point-of-sale terminals, NPS systems, and social media feeds. For this example, we will look at transforming two different data sources into a consistent schema which can then be sent to the same sink connection to be used for analysis, regardless of the original source or form of the data.
Below we can see examples of raw geolocation and point of sale data. Each data source is in a unique data format and uses different field names for similar data. By using one or more Decodable pipelines, which are streaming SQL queries that process data, we can transform the raw data into a form that is best suited for how it will be consumed.
Point of Sale Records
For this example, a single pipeline is used to process each of the two raw incoming data streams into the desired form. Depending on the complexity of the processing required, it is also possible to use multiple pipelines in a series of stages, with the output of each one being used as the input for the next. In more complex cases, it can be helpful to break it down into smaller, more manageable steps. This results in pipelines that are easier to test and maintain. Each stage in the sequence of pipelines is used to bring the data closer to its final desired form using SQL queries.
Decodable uses SQL to process data that should feel familiar to anyone who has used relational database systems. The primary differences you’ll notice are that:
- You activate a pipeline to start it, and deactivate a pipeline to stop it
- All pipeline queries specify a source and a sink
- Certain operations, notably JOINs and aggregations, must include windows
Unlike relational databases, all pipelines write their results into an output data stream (or sink). As a result, all pipelines are a single statement in the form INSERT INTO <sink> SELECT ... FROM <source>, where sink and source are streams you’ve defined.
Transform geolocation records
For this example, a combination of customer device locations and where they have made purchases will be used to inform marketing campaign activities. This first pipeline will transform the raw geolocation data stream into a standardized schema.
Pipeline: Standardize geolocation data stream
After creating a new pipeline and entering the SQL query, clicking the Run Preview button will verify its syntax and then fire up a new executable environment to process the next 10 records coming in from the source stream and display the results. Decodable handles all the heavy lifting on the backend, allowing you to focus on working directly with your data streams to ensure that you are getting the results you need.
Transform point-of-sales records
For the point of sales terminal data, the required transformations for this example are fairly minimal. Primarily the field names are changed to match the desired schema for a standardized data stream, and the created_at field is converted to a timestamp.
Pipeline: Standardize pos-terminal data stream
At this point, a sink connection (one that writes a stream to an external system, such as AWS S3, Kafka, Kinesis, Postgres, Pulsar, or Redpanda) can be created to allow the results to be consumed by your own applications and services.
As we can see from this example, a sophisticated business problem can be addressed in a very straight-forward way using Decodable pipelines. It is not necessary to create docker containers, there is no SQL server infrastructure to set up or maintain, all that is needed is a working familiarity with creating the SQL queries themselves.
You can watch demonstrations of several examples on the Decodable YouTube channel.
Additional documentation for all of Decodable’s services is available here.
Please consider joining us on our community Slack.