XML is still used extensively today – banking services, online retail stores, integrating industrial systems, and many other legacy systems. XML-based Security Assertion Markup Language (SAML) is a heavily used authorization protocol today, and SalesForce’s API also is based on XML. The endurance of XML is very obvious and cannot be ignored, and as as result should be part of any data integration strategy and toolset to enable seamless integration with newer and more performant formats.
Integrating XML with newer technologies should be simple but it isn’t, because as we continue to modernize, XML becomes less interoperable. This should be no surprise since many modern systems like schema registries support formats like JSON, Protobuf, and Avro but not XML. This leaves many businesses with a lot of technical debt related to XML, left on their own to find solutions to transforming their XML data to these modern formats. Solutions typically require work in some imperative programming language; work that is typically underestimated, expensive to maintain and draws developers away from other more strategic tasks.
In this blog, we will transform XML to JSON using SQL. XML data is read from a Kafka topic, transformed it into JSON and written to another Kafka topic while preserving its structure. This transformation provides a simple solution that takes minutes to implement.
Transforming XML to JSON on the Stream
In this demonstration, we will be publishing the XML document below to Decodable to be transformed into JSON.
The diagram below illustrates the flow of the XML. Decodable will read from a Kafka topic containing XML messages then bring it into Decodable. The XML Parser Pipeline will convert the XML into JSON before writing it out to the JSON topic.
First, create a .env file to place all of your credentials for connectivity to Kafka and Decodable.
We will use these properties to populate commands that will construct the flow we previously illustrated.
First we want to define the streams for the RAW and Parsed. The pipeline will convert the XML into an internal format so that when the Kafka Sink processes it, it is transformed into JSON.
Below are the contents of a Makefile that generates the two streams within Decodable: demo_day_xml_raw and demo_day_parsed.
Run these tasks from the command line.
Kafka Source Connection
The command below creates a Decodable connection to a Kafka topic in Confluent. It’s important to remember two things when building this connection:
- The format should be raw.
- There should only be one field: --field xml=string
Since the format is raw, you cannot apply a schema so therefore you cannot define more than one field. The one field will contain the raw message. In this case our XML message.
This connection will subscribe to the Kafka topic and send it to a Decodable stream for raw XML messages.
Next we create a Decodable pipeline to parse the raw xml in demo_day_xml_raw and write the parsed data into demo_day_parsed stream. The SQL selects from the raw stream the XML field and passes it to the xpaths() function. The xpath creates a dictionary whose alias is employee with the key and value - the key being fname and lname and the value being the result of the xpath queries.
Notice also that the pipeline is also changing the name of the fields:
- fname to first_name
- lname to last_name
Kafka Sink Connection
Lastly, the sink connection publishes the output of the pipeline to a Kafka topic reading from the demo_day_parsed stream. Notice the format is json which tells the sink connection to convert the internal format to a JSON message before writing it to the topic.
Using XML shouldn’t be difficult but many current technologies don’t support XML anymore making it harder to integrate existing applications. Decodable makes working with legacy technologies easier as you start to modernize your business. If you have any questions or need assistance with getting data from your legacy systems, please contact us at email@example.com .
Watch a video of this demo:
You can get started with Decodable for free - our developer account includes enough for you to build a useful pipeline and - unlike a trial - it never expires.