Food Delivery

Math
Transform
XML
Extract
Try it

Solution Overview

How the world eats is changing dramatically. Twenty years ago, restaurant-quality meal delivery was largely limited to pizza. Today, food delivery has become a global market worth more than $150 billion, having more than tripled in the last 5 years. The advent of appealing, user-friendly apps and tech-enabled driver networks, coupled with changing consumer expectations, has unlocked ready-to-eat food delivery as a major category.

In this example, we’ll walk through how the Decodable data service is used to clean, transform, and enrich real-time food delivery data. The processed data can then be used to send customers SMS text messages with progress status updates.

Pipeline Architecture

Below we can see a sample of raw food delivery data. For this example, the source of the data is a legacy system that produces an XML object.

Even when the XML is examined in its structured form, it is far more complex and detailed than what customers want to know about their delivery and what is needed to update them with status messages. By using one or more Decodable pipelines, which are streaming SQL queries that process data, we can transform the raw data into a form that is best suited for how it will be consumed.

For this example, two separate pipelines are used in series, with the output of each one being used as the input for the next. While it is possible to perform all the desired processing in a single large, complex pipeline, it is most often desirable to split them into smaller, more manageable processing steps. This results in pipelines that are easier to test and maintain. Each stage in the sequence of pipelines is used to bring the data closer to its final desired form using SQL queries.

Decodable uses SQL to process data that should feel familiar to anyone who has used relational database systems. The primary differences you’ll notice are that:

  • You activate a pipeline to start it, and deactivate a pipeline to stop it
  • All pipeline queries specify a source and a sink
  • Certain operations, notably JOINs and aggregations, must include windows

Unlike relational databases, all pipelines write their results into an output data stream (or sink). As a result, all pipelines are a single statement in the form INSERT INTO <sink> SELECT ... FROM <source>, where sink and source are streams you’ve defined.

Parse XML Object

As with most data services pipelines, the first step is to apply a variety of transformations to clean up and simplify the input data. For this example, the first pipeline is used to parse and restructure the raw data as follows:

  • parse the XML object blob using the xpaths function and extract the desired fields
  • numeric fields will be converted from string to integers or floats as needed
  • the time-based fields will be converted from string to timestamp data types, which enables more sophisticated processing in subsequent pipelines

Pipeline: Extract Delivery Update data

insert into parsed
select
  cast(delivery_update.order_id as bigint) as order_id,
  cast(delivery_update.branch_id as bigint) as branch_id,
  cast(delivery_update.customer_id as bigint) as customer_id,
  delivery_update.state_human_readable as state_human_readable,
  delivery_update.notification as notification,
  to_timestamp(delivery_update.earliest) as earliest,
  to_timestamp(delivery_update.latest) as latest,
  to_timestamp(delivery_update.dispatched_at) as dispatched_at,
  to_timestamp(delivery_update.completed_at) as completed_at,
  cast(delivery_update.branch_lat as float) as branch_lat,
  cast(delivery_update.branch_lon as float) as branch_lon,
  cast(delivery_update.customer_lat as float) as customer_lat,
  cast(delivery_update.customer_lon as float) as customer_lon,
  cast(delivery_update.driver_lat as float) as driver_lat,
  cast(delivery_update.driver_lon as float) as driver_lon
from (
  select
    -- parse XML to a DOM and extract fields using XPath expressions
    xpaths(xml,
      'order_id', '//order/@id',
      'branch_id', '//order/branch/@id',
      'customer_id', '//order/customer/@id',
      'state_human_readable', '//order/@state_human_readable',
      'notification', '//order/notification',
      'earliest', '//order/estimations/earliest_completed_at',
      'latest', '//order/estimations/latest_completed_at',
      'dispatched_at', '//order/timestamps/dispatched_at',
      'completed_at', '//order/timestamps/completed_at',
      'branch_lat', '//order/branch/location/latitude',
      'branch_lon', '//order/branch/location/longitude',
      'customer_lat', '//order/customer/location/latitude',
      'customer_lon', '//order/customer/location/longitude',
      'driver_lat', '//order/driver/location/latitude',
      'driver_lon', '//order/driver/location/longitude'
    ) as delivery_update
  from `delivery-raw`
)

Enrich Data Stream

For this example, we want to enrich the data stream with the delivery driver’s progress in reaching the customer destination. SQL provides a comprehensive set of powerful functions, such as power and sqrt, which can be leveraged to perform calculations that are useful for subsequent processing. This somewhat complex SQL query example could be broken down into two smaller, simpler queries; but it is also possible to create pipelines of arbitrary complexity based on your requirements.

An inner nested select query calculates distances using the latitude and longitude of the origination point (i.e., the branch location), the driver location, and the customer location. By using the SQL case statement, we can avoid performing the expensive haversine distance formula based on whether the driver has left the branch location or arrived at the customer location. By taking care to reduce the computational complexity of the pipeline, stream processing throughput can be increased.

Once the distances have been calculated, the surrounding query calculates a progress percentage based on how far the driver is from the customer and the overall distance between the branch and the customer.

Pipeline: Calculate delivery progress

Conclusion

At this point, a sink connection (one that writes a stream to an external system, such as AWS S3, Kafka, Kinesis, Postgres, Pulsar, or Redpanda) can be created to allow the results to be consumed by your own applications and services.

As we can see from this example, a sophisticated business problem can be addressed in a very straight-forward way using Decodable pipelines. It is not necessary to create docker containers, there is no SQL server infrastructure to set up or maintain, all that is needed is a working familiarity with creating the SQL queries themselves.


You can watch demonstrations of several examples on the Decodable YouTube channel.

Additional documentation for all of Decodable’s services is available here.

Please consider joining us on our community Slack.

Other Solutions

Customer Data

Aggregating customer data streams: call logs, clickstream data, ecommerce activity, geolocation, point-of-sale terminals, social media feeds.

Learn more

Web Clickstream

Clickstream data is used to provide insight and analysis into how visitors get to a website, how they interact with it, pages they visit and duration.

Learn more

Inventory Control

The ability to track and manage the movement of products through your warehouse is critical to the health and growth of businesses and to satisfy customers, on time.

Learn more

Sporting Events

Sports fans want to find team or transfer news, match highlights, and club merchandise on-demand. A sports app driven by real-time data has never been so valuable.

Learn more

Shipping & Tracking

The ability to see, in real-time, logistics and tracking information improves transportation decisions leading to reduced costs and enhanced services.

Learn more

Health Monitoring

Data from healthcare monitoring devices can inform healthcare staff of any changes in patient condition, alert them to issues with devices, and respond proactively.

Learn more

Fraud Detection

Securing online applications and services is a major requirement for businesses of all types, and threat actors are constantly increasing the sophistication of their attacks.

Learn more

Financial Analysis

Financial analysis is used to evaluate economic trends, set financial policy, build long-term plans, and identify investments or prioritize projects.

Learn more

Customer 360

Customer 360 connects apps and data sources from customer interactions to give businesses a 360-degree view across the end to end customer journey.

Learn more

Flight Status

Airline customers expect accurate, real-time updates on flight status at every stage of their journey and accurate data also helps streamline airline operations.

Learn more

Claims Adjudication

Errors in claims data can indicate the need for review by a claims examiner, including mismatched coding, omission of required data, and noncompliance.

Learn more

Heading

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Learn more

Let's Get Decoding