Stream Processing Lags Streaming Adoption
Why hasn’t stream processing experienced hockey stick growth like streaming has? In our experience, the majority of data (by some estimates, as much as 80%) passing through real time streaming platforms (Kafka, Kinesis, Pulsar, etc) is not transformed in flight, just passed along. Google Trends provides solid evidence that stream processing is not enjoying the widespread adoption it deserves:
Culture and Skills are a Drag
We see a few main reasons why real-time processing isn’t growing proportionately, all of which are culture-based:
1) Transforming data in flight is a lot harder than simply writing to/reading from a stream. Most engineers don’t have the expertise in streaming to intercept messages in flight, and many don’t know Java/Scala/Python well enough to perform the transformations reliably. The business demand for in-flight data processing talent far outstrips the supply, throttling adoption of the technique.
2) Data engineers have muscle memory around batch ELT, so they’ll use Kafka/Kinesis/etc to move the data to a DB and then transform it after it lands (a very common antipattern). Because of batch-based architectures, everything in the world has been instant since the development of the internet except our datacenters. Customers purchase in real-time but inventory systems, dashboards, and sales software operate with hours or days of lag. So even when companies use Kafka, users often cram a square into a round hole because that’s what they’re used to. As a result, your real-time app or dashboard becomes…well, very not real-time, and your Event Driven Architecture becomes just an Architecture.
3) Streaming transformation platforms are historically hard to operate. Don't believe this? Put yourself in the shoes of a data platform team leader and check your reaction as your users ask for an open source software solution(say, Storm or Flink) so they can run transform jobs. Now your team is supporting an open source technology with no support backstop, a technology for which there is hot competition for talent and experience in the market. What’s worse, you’re supporting users, many of who (as stated in #1) are not skilled at streaming OR Java/Scala/Python. Your users are essentially chaos monkeys…and guess who’s going to get pulled into re-writing jobs constantly? Supporting odd workloads in Kafka can be tough, but supporting odd workloads in a stream processing platform is untenable.
Eliminate the Friction
The world needs a platform that makes streaming transformation easy. Easy for the people writing jobs and easy for people supporting the platform.
Make it easy for the people writing transformation jobs:
- An intuitive UI, CLI, and API
- Jobs in SQL, not coding languages
- Crafted guardrails to prevent mistakes and anti-patterns in the first place
- Clear error messages and documentation
- Easy to connect sources and deliver to destinations
Make it easy for the team operating the platform:
- Autoscaling performance: gigabytes per second at tens of millisecond latencies
- No clusters, no CSU saturation, no AZ considerations, no CPU/Disk monitoring
- Predictable pricing, not pay-per-event. You don’t have to be a forensic accountant
- Robust telemetry and monitoring
- Simplified CSP (Connection, Stream, Pipeline) paradigm. With other tools, you have to look up properties and create connections in ddl. We’ve already boxed up the config parameters, tuned them right. We won’t ever ask about delivery modes. We won’t ask about consumer groups or offsets.
At Decodable we’re on a mission to make stream processing intuitive, safe, correct and fast.
Try our quickstart walkthrough and have a pipeline up in 3 minutes.