ML.NET 2.0 Release Contains New NLP APIs and AutoML Updates
Published on Dec 28, 2022
ML.NET 2.0, Microsoft’s open-source machine learning framework for .NET, has been released. The release includes several new natural language processing (NLP) APIs, including tokenizers, text classification, and sentence similarity, as well as improved automated machine learning (AutoML).
The release was announced by Luis Quintanilla at the recent .NET Conf 2022. TorchSharp, a .NET wrapper for the popular PyTorch deep learning framework, powers the updated NLP APIs. The release includes an EnglishRoberta tokenization model and a TorchSharp implementation of NAS-BERT, which is used by the Text Classification and Sentence Similarity APIs. An API for automated data pre-processing has been added to AutoML, as well as APIs for running experiments to identify the most effective models and hyperparameters. In addition, Quintanilla announced a new version of the Model Builder tool for Visual Studio, which includes a new text classification scenario and advanced training options.
The Text Classification API was previewed earlier this year and is based on the NAS-BERT model published by Microsoft Research in 2021. The model was developed using neural architecture search (NAS), resulting in smaller models than the standard BERT model while maintaining accuracy. By fine-tuning the pre-trained NAS-BERT model with their own data, users can tailor it to fit their specific requirements. The Sentence Similarity API uses the same pre-trained model, but instead of classifying an input string, it takes two strings and calculates the degree of similarity between their meanings.
AML APIs are based on Microsoft’s Fast Library for Automated Machine Learning & Tuning (FLAML). The Featurizer API is designed for pre-processing, whereas the other APIs work together to determine the optimal set of hyperparameters. Using a Tuner, the Experiment API coordinates the optimization of a Sweepable pipeline over a Search Space. Developers can use the Sweepable API to define the training pipeline for hyperparameter optimization; the Search Space API to determine the range of hyperparameter search space; and the Tuner API to select a search algorithm. There are several tuner algorithms included in this release, including basic grid and random searches as well as Bayesian and Frugal optimizers.
Quintanilla also provided a preview of the ML.NET roadmap. Future deep learning features will include new scenarios and APIs for question answering, named entity recognition, and object detection. TorchSharp integrations are also planned for custom scenarios, as well as improvements to the ONNX integration. We intend to upgrade the LightGBM implementation and the implementation of the IDataView interface, as well as to improve the AutoML API.
Quintanilla answered questions from the audience following his presentation. A viewer asked about support for different GPUs and accelerator libraries from different vendors. Quintanilla replied that only NVIDIA’s CUDA accelerator is currently supported.
C++ atomics are added to Visual Studio 2022
As part of version 17.5 Preview 2, a number of improvements have been made to the IDE, including an…
Introducing Cadl: Microsoft’s concise API design language
A 500-line OpenAPI definition can be written in 50 lines of code using Cadl. Architects and developers…