Blog Logo
TAGS

Orchestrating Data/ML Workflows at Scale With Netflix Maestro

At Netflix, data and machine learning (ML) pipelines are central to the business, representing diverse use cases that go beyond recommendations, predictions, and data transformations. In this blog post, we introduce and share learnings on Maestro, a workflow orchestrator that can schedule and manage workflows at a massive scale. The need for a scalable data workflow orchestrator has become paramount for Netflix’s business needs. The orchestrator has to schedule hundreds of thousands of workflows, millions of jobs every day and operate with a strict SLO of less than 1 minute of scheduler introduced delay even when there are spikes in the traffic. The system has to withstand bursts in traffic while still maintaining the SLO requirements. Maestro addresses the key challenges faced with Meson and achieves operational excellence. We discuss Maestro’s design and architecture around scalability, fault tolerance, and usability to provide workflow as a service to the hundreds of Netflix developers, data scientists, and analysts.