Blog /

Transforming XML on the Stream

Hubert Dulay

XML Endures

XML is still used extensively today – banking services, online retail stores, integrating industrial systems, and many other legacy systems. XML-based Security Assertion Markup Language (SAML) is a heavily used authorization protocol today, and SalesForce’s API also is based on XML. The endurance of XML is very obvious and cannot be ignored, and as as result should be part of any data integration strategy and toolset to enable seamless integration with newer and more performant formats.

Integrating XML with newer technologies should be simple but it isn’t, because as we continue to modernize, XML becomes less interoperable. This should be no surprise since many modern systems like schema registries support formats like JSON, Protobuf, and Avro but not XML. This leaves many businesses with a lot of technical debt related to XML, left on their own to find solutions to transforming their XML data to these modern formats. Solutions typically require work in some imperative programming language; work that is typically underestimated, expensive to maintain and draws developers away from other more strategic tasks.

In this blog, we will transform XML to JSON using SQL. XML data is read from a Kafka topic, transformed it into JSON and written to another Kafka topic while preserving its structure. This transformation provides a simple solution that takes minutes to implement.

Transforming XML to JSON on the Stream

In this demonstration, we will be publishing the XML document below to Decodable to be transformed into JSON.


The diagram below illustrates the flow of the XML. Decodable will read from a Kafka topic containing XML messages then bring it into Decodable. The XML Parser Pipeline will convert the XML into JSON before writing it out to the JSON topic.


First, create a .env file to place all of your credentials for connectivity to Kafka and Decodable.

We will use these properties to populate commands that will construct the flow we previously illustrated.

Stream Definitions

First we want to define the streams for the RAW and Parsed. The pipeline will convert the XML into an internal format so that when the Kafka Sink processes it, it is transformed into JSON.

Below are the contents of a Makefile that generates the two streams within Decodable: demo_day_xml_raw and demo_day_parsed.

Run these tasks from the command line.

Kafka Source Connection

The command below creates a Decodable connection to a Kafka topic in Confluent. It’s important to remember two things when building this connection:

  1. The format should be raw.
  2. There should only be one field: --field xml=string

Since the format is raw, you cannot apply a schema so therefore you cannot define more than one field. The one field will contain the raw message. In this case our XML message.

This connection will subscribe to the Kafka topic and send it to a Decodable stream for raw XML messages.

Pipeline SQL

Next we create a Decodable pipeline to parse the raw xml in demo_day_xml_raw and write the parsed data into demo_day_parsed stream. The SQL selects from the raw stream the XML field and passes it to the xpaths() function. The xpath creates a dictionary whose alias is employee with the key and value - the key being fname and lname and the value being the result of the xpath queries.

Notice also that the pipeline is also changing the name of the fields:

  • fname to first_name
  • lname to last_name

Kafka Sink Connection

Lastly, the sink connection publishes the output of the pipeline to a Kafka topic reading from the  demo_day_parsed stream. Notice the format is json which tells the sink connection to convert the internal format to a JSON message before writing it to the topic.


Using XML shouldn’t be difficult but many current technologies don’t support XML anymore making it harder to integrate existing applications. Decodable makes working with legacy technologies easier as you start to modernize your business. If you have any questions or need assistance with getting data from your legacy systems, please contact us at .

Watch a video of this demo:

You can get started with Decodable for free - our developer account includes enough for you to build a useful pipeline and - unlike a trial - it never expires.

Learn more:

Demo Day: processing real-time crypto transactions fed by DataPM

In this demo day recording, see how DataPM's coinbase package manager feeds crypto trades to Decodable where we perform a bunch of windowing and aggregation transformations

Learn more

Decodable Streaming with Eric Sammer

Eric Sammer is founder and CEO of Decodable and joins the show to discuss the potential of stream processing, its role in modern data platforms, and how it’s being used today.

Learn more

A Practical Introduction to the Data Mesh

There’s been quite a bit of talk about data meshes recently, both in terms of philosophy and technology. Unfortunately, most of the writing on the subject is thick with buzzwords, targeted toward VP and C-level executives, unparsable to engineers. The motivation behind the data mesh, however, is not only sound but practical and intuitive.

Learn more

Demo Day: Confluent Cloud with Airline data

In this week's video, Arlo Purcell shows how easy it is to connect to Confluent Cloud as both a sink and source, with auto-detection and import of a complex schema on both ends. In this demo he's transforming Airline industry standard SSIM XML data representing flight schedules.

Learn more


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Learn more
Pintrest icon in black

Start using Decodable today.