Paul Krill
Editor at Large

Microsoft previews text classification API for ML.NET

news
Jun 15, 20222 mins

New text classification API for Microsoft’s open source machine learning framework streamlines model training by using your data to fine-tune an existing model.

artificial intelligence brain machine learning digital transformation world networking
Credit: Getty Images

Microsoft has unveiled a preview of the ML.NET Text Classification API, an API intended to make it easier to train custom text classification models using the open source ML.NET machine learning framework.

Introduced June 14, the ML.NET Text Classification API uses “state-of-the-art” deep learning techniques, Microsoft said. ML.NET allows developers to integrate custom machine learning models into .NET apps. Text classification is the process of applying labels or categories to text. Common use cases include categorizing email as spam or not spam, analyzing sentiment as positive or negative from customer reviews, and applying labels to support tickets.

The ML.NET Text Classification API is powered by the TorchSharp .NET library, which provides access to the libtorch library that powers the PyTorch machine learning framework. TorchSharp has low-level capabilities for training neural networks from scratch in .NET. For ML.NET, some of the complexity of TorchSharp has been abstracted to make this training easier.

In collaboration with Microsoft Research, Microsoft took the TorchSharp implementation of NAS-BERT (Bidirectional Encoder Representations from Transformers), a variant of BERT obtained with neural architecture search, and added it to ML.NET. Starting with a pre-trained version of this model, the Text Classification API uses the user’s data to fine-tune the existing model rather than to build a new model from scratch.

The Text Classification API is part of the 2.0.0 and 0.20.0 preview versions of ML.NET. In addition to the Microsoft.ML package, it requires Microsoft.ML.TorchSharp and either TorchSharp-cpu (if using a CPU) or TorchSharp-cuda-windows or TorchSharp-cuda-linux (if using a GPU).

Developers can use the NuGet package manager in Visual Studio or the .NET CLI to install the packages. Code samples of the API can be found in the Text Classification API Notebook.

Microsoft pointed out there are still limitations with the API such as not being able to use the Evaluate method to calculate evaluation metrics. Improvements are planned to the API along with introducing other scenario-based APIs.

Paul Krill

Paul Krill is editor at large at InfoWorld. Paul has been covering computer technology as a news and feature reporter for more than 35 years, including 30 years at InfoWorld. He has specialized in coverage of software development tools and technologies since the 1990s, and he continues to lead InfoWorld’s news coverage of software development platforms including Java and .NET and programming languages including JavaScript, TypeScript, PHP, Python, Ruby, Rust, and Go. Long trusted as a reporter who prioritizes accuracy, integrity, and the best interests of readers, Paul is sought out by technology companies and industry organizations who want to reach InfoWorld’s audience of software developers and other information technology professionals. Paul has won a “Best Technology News Coverage” award from IDG.

More from this author