Customer Data Solution

Solution Overview

With nearly 60% of the global population connected online, we are truly living in a data-driven economy. Even so, many businesses struggle to capture most of the data generated by websites, mobile apps, social media, and other digital channels every day. Customer Data Platform (CDP) embodies the process of collecting and using data from the users of your digital properties, including profile data and real-time interaction data (behavioral, demographics, transactional), data from marketing campaigns, customer support, point of sale, IoT, and many more. By creating a comprehensive view of potential customers, businesses can generate more insightful customer analytics, identify new audience segments, and optimize marketing campaigns.

In this example, we’ll walk through how the Decodable data service is used to clean, transform, and aggregate data from multiple data sources.

Pipeline Architecture

Data streams that feed Customer Data Platforms come in many forms from many sources, including call logs, clickstream data, ecommerce activity, geolocation, point-of-sale terminals, NPS systems, and social media feeds. For this example, we will look at transforming two different data sources into a consistent schema which can then be sent to the same sink connection to be used for analysis, regardless of the original source or form of the data.

Below we can see examples of raw geolocation and point of sale data. Each data source is in a unique data format and uses different field names for similar data. By using one or more Decodable pipelines, which are streaming SQL queries that process data, we can transform the raw data into a form that is best suited for how it will be consumed.

GeoLocation Records

‍

Point of Sale Records

‍

For this example, a single pipeline is used to process each of the two raw incoming data streams into the desired form. Depending on the complexity of the processing required, it is also possible to use multiple pipelines in a series of stages, with the output of each one being used as the input for the next. In more complex cases, it can be helpful to break it down into smaller, more manageable steps. This results in pipelines that are easier to test and maintain. Each stage in the sequence of pipelines is used to bring the data closer to its final desired form using SQL queries.

Decodable uses SQL to process data that should feel familiar to anyone who has used relational database systems. The primary differences you’ll notice are that:

You activate a pipeline to start it, and deactivate a pipeline to stop it
All pipeline queries specify a source and a sink
Certain operations, notably JOINs and aggregations, must include windows

Unlike relational databases, all pipelines write their results into an output data stream (or sink). As a result, all pipelines are a single statement in the form INSERT INTO <sink> SELECT ... FROM <source>, where sink and source are streams you’ve defined.

Transform geolocation records

For this example, a combination of customer device locations and where they have made purchases will be used to inform marketing campaign activities. This first pipeline will transform the raw geolocation data stream into a standardized schema.

Pipeline: Standardize geolocation data stream

‍

After creating a new pipeline and entering the SQL query, clicking the Run Preview button will verify its syntax and then fire up a new executable environment to process the next 10 records coming in from the source stream and display the results. Decodable handles all the heavy lifting on the backend, allowing you to focus on working directly with your data streams to ensure that you are getting the results you need.

Transform point-of-sales records

For the point of sales terminal data, the required transformations for this example are fairly minimal. Primarily the field names are changed to match the desired schema for a standardized data stream, and the created_at field is converted to a timestamp.

Pipeline: Standardize pos-terminal data stream

‍

Conclusion

At this point, a sink connection (one that writes a stream to an external system, such as AWS S3, Kafka, Kinesis, Postgres, Pulsar, or Redpanda) can be created to allow the results to be consumed by your own applications and services.

‍

As we can see from this example, a sophisticated business problem can be addressed in a very straight-forward way using Decodable pipelines. It is not necessary to create docker containers, there is no SQL server infrastructure to set up or maintain, all that is needed is a working familiarity with creating the SQL queries themselves.

You can watch demonstrations of several examples on the Decodable YouTube channel.

Additional documentation for all of Decodable’s services is available here.

Please consider joining us on our community Slack.

‍

Customer Data

Solution Overview

Pipeline Architecture

GeoLocation Records

Point of Sale Records

Transform geolocation records

Pipeline: Standardize geolocation data stream

Transform point-of-sales records

Pipeline: Standardize pos-terminal data stream

Conclusion

Other Solutions

Building a Clickstream Analytics Pipeline

Shipping & Tracking

Health Monitoring

Fraud Detection

Food Delivery

Building a Claims Adjudication Pipeline

Heading

Let's get decoding