Serdar Yegulalp
Senior Writer

PyText builds on PyTorch for language recognition

news
Dec 14, 20182 mins

A Facebook project for natural language processing is now open source, and it promises better ways to mine texts for meaning

cybersecurity eye with binary face recognition abstract eye
Credit: Getty Images

Facebook has open-sourced its PyText project, a machine learning library for natural language processing (NLP) intended to make it easier to put together both experimental projects and production systems.

PyText, built with Facebook’s existing PyTorch library for machine learning and used internally by the company, was created to address how machine learning using neural networks (such as for NLP). Such libraries typically were “a trade-off between frameworks optimized for experimentation and those optimized for production,” they said in a post.

Frameworks built for experimentation allowed fast prototyping, but suffered from “increased latency and memory use in production,” Facebook’s engineers wrote. On the other hand, frameworks built for production worked better under load, but were tougher to develop quickly with.

PyText’s main touted difference is its workflow, which Facebook claims can be optimized for either experiments or production use. The frameworks’ components can be stitched together to create an entire NLP pipeline, or individual pieces can be broken out and reused in other contexts.

Training new models can be distributed across multiple nodes, and multiple models can be trained at the same time. PyText can also use many existing models for text classification, skipping the need for training entirely in those cases.

PyText also improves comprehension via contextual models, a way to enrich the model’s understanding of a text from previous inputs. A chatbot, for example, could reuse information from earlier messages in a discussion to shape its answers.

One feature in PyText shows how machine learning systems driven by Python find ways to avoid the performance issues that can crop up with the language. PyText models can be exported in the optimized ONNX format for fast inferencing with Caffe2. This way, the inferencing process isn’t limited by Python’s runtime, but Python is still used to assemble the pipeline and orchestrate model training.

PyTorch itself was recently given a formal Version 1.0 release, with its own share of features intended to speed training and inference without being limited by Python. One of them, Torch Script, just-in-time-compiles Python code to speed its execution, but it can work only with a subset of the language.

Near-term plans for PyText include “supporting multilingual modeling and other modeling capabilities, making models easier to debug, and adding further optimizations for distributed training,” Facebook’s engineers say.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author