Serdar Yegulalp
Senior Writer

Netflix open sources data science management tool

news
Dec 6, 20192 mins

Metaflow manages Python data science projects end-to-end, works with any machine learning library, and integrates with AWS cloud services

Neon Open sign
Credit: Thinkstock

Netflix has open sourced Metaflow, an internally developed tool for building and managing Python-based data science projects. Metaflow addresses the entire data science workflow, from prototype to model deployment, and provides built-in integrations to AWS cloud services. 

Machine learning and data science projects need mechanisms to track the development of the code, data, and models. Doing all of that manually is error-prone, and tools for source code management, like Git, aren’t well-suited to all of these tasks.

Metaflow provides Python APIs to the entire stack of technologies in a data science workflow, from access to the data through compute resources, versioning, model training, scheduling, and model deployment.

According to Metaflow’s introductory documentation, Netflix built Metaflow to provide its own data scientists and developers with “a unified API to the infrastructure stack that is required to execute data science projects, from prototype to production,” and to “focus on the widest variety of ML use cases, many of which are small or medium-sized, which many companies face on a day to day basis.”

Metaflow does not favor any particular machine learning framework or data science library. Metaflow projects are just Python code, with each step of a project’s data flow represented by common Python programming idioms. Each time a Metaflow project runs, the data it generates is given a unique ID. This lets you access every run—and every step of that run—by referring to its ID or user-assigned metadata. 

Netflix recommends running Metaflow on AWS. The company offers a sandboxed version of Metaflow there (with restrictions on storage and data lifetime) for developers to experiment with the framework. 

The first public release of Metaflow, Metaflow 2.0, lacks some of the features Netflix uses internally, such as support for the R language or in-memory processing of large data by way of DataFrames. But Netflix is willing to make those features available if their corresponding GitHub issues attract enough support.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author