Back
July 14, 2022
60
min read

The Interview: Machine Learning With Flink

By
Robert Metzger
Share this post

Apache Flink is a robust big data processing framework that works for both stream and batch processing and is the “heir apparent” to Hadoop and Spark. Apache Flink ML is a library which provides machine learning APIs and infrastructures that simplify the building of ML pipelines. Flink ML supports use cases like predictive intelligence, customer segmentation and many more.

Robert Metzger, Software Engineer at Decodable and PMC member of Apache Flink, recently spoke with Dong Lin, a Flink committer and one of the driving forces behind Flink ML, on the eve of the July release of Apache Flink ML 2.1.0. They discussed the status of the project and the plans for its future.

If you’re looking for an introduction to the machine learning space in general and what Flink ML brings to the space, this video of Robert & Dong’s conversation is a great place to start.

What you’ll learn:

  • What kind of machine learning tasks are suitable for Flink? What features of Flink make it well suited for machine learning?
  • What are the main competitors to Flink as an overall solution, and what are the competitors of Flink ML in the machine learning space?
  • Where in the machine learning space does Flink ML fit?
  • Flink itself can do joining aggregations quite well with various API's. But what is Flink ML providing on top of that?
  • What is feature engineering and why does Flink excel in this?
  • Are there any plans for Flink ML to use other language ecosystems?
  • Are there any examples of Flink ML integration with TensorFlow or other common popular frameworks?
  • What is the Flink-Extended organization in GitHub, and what are projects like Clink, Deep Learning, and AI Flow all about?  
  • What kind of training can you do with Flink ML and what use cases can you actually implement using these algorithms?
  • What Flink ML be used for model inference?
  • What are the new features of Flink ML 2.1?
  • What are the plans for the next release?
  • What's the long term vision for Flink ML?
  • If someone is interested in contributing to Flink ML, where can they start?

Watch the interview:

📫 Email signup 👇

Did you enjoy this issue of Checkpoint Chronicle? Would you like the next edition delivered directly to your email to read from the comfort of your own home?

Simply enter your email address here and we'll send you the next issue as soon as it's published—and nothing else, we promise!

👍 Got it!
Oops! Something went wrong while submitting the form.
Robert Metzger

Robert oversees the core Apache Flink-based data platform at Decodable powering the SaaS stream processing platform. Beyond this role, he’s a committer and the PMC Chair of the Apache Flink project. He has co-created Flink and contributed many core components of the project over the years. He previously co-founded and successfully exited data Artisans (now Ververica), the company that created and commercialized Flink.