Paul Krill
Editor at Large

Oracle open-sources Java machine learning library

news
Sep 15, 20203 mins

Tribuo offers tools for building and deploying classification, clustering, and regression models in Java, along with interfaces to TensorFlow, XGBoost, and ONNX

machine learning and mlops hpe ezmeral softwaretg
Credit: shutterstock

Looking to meet enterprise needs in the machine learning space, Oracle is making its Tribuo Java machine learning library available free under an open source license.

With Tribuo, Oracle aims to make it easier to build and deploy machine learning models in Java, similar to what already has happened with Python. Released under an Apache 2.0 license and developed by Oracle Labs, Tribuo is accessible from GitHub and Maven Central.

Tribuo provides standard machine learning functionality including algorithms for classification, clustering, anomaly detection, and regression. Tribuo also includes pipelines for loading and transforming data and provides a suite of evaluations for supported prediction tasks. Because Tribuo collects statistics on inputs, Tribuo can describe the range of each input, for example. It also names features, managing feature IDs and output IDs under the hood to avoid ID conflicts and confusion when chaining models, loading data, and featurizing inputs.

A Tribuo model knows when it sees a feature for the first time, which is particularly useful when working with natural language processing. Models know what outputs are, with outputs being strongly typed. Developers do not need to wonder if a float is a probability, a regressed value, or a cluster ID. With Tribuo, each of these is a separate type; the model can describe types and ranges it knows about. Use of strongly typed inputs and outputs means Tribuo can track the model construction process, from the point data is loaded through train/test splits or dataset transformations to model training and evaluation. This tracking data is baked into all models and evaluations.

The Tribuo provenance system can generate a configuration that rebuilds the training pipeline to reproduce the model or evaluation. Also, a tweaked model can be built on new data or hyperparameters. Thus users always know what a Tribuo model is, where it came from, and how to create it.

Oracle sees Tribuo filling a gap in the marketplace for machine learning for enterprise applications. For example, whereas the Google-built TensorFlow library provides core algorithms for deep learning, Tribuo provides several machine learning algorithms, some of which are in TensorFlow and some of which are not, while also providing an interface to TensorFlow, said Oracle’s Adam Pocock, principal member of the Oracle Labs technical staff. And whereas the Apache Spark analytics engine is for large, distributed systems, Tribuo is for smaller computations that can fit on a single machine, Pocock said.

In addition to TensorFlow, Tribuo provides interfaces to XGBoost and the ONNX runtime, allowing models stored in the ONNX format or trained in TensorFlow and XGBoost to be deployed alongside native Tribuo models. Support for the ONNX model format allows deployment in Java of models trained using popular Python libraries such as PyTorch.

Tribuo runs on Java 8 or later. Oracle accepts code contributions to Tribuo under the Oracle Contributor Agreement. Tribuo already has been used internally at Oracle in the Fusion Cloud ERP product for intelligent document recognition, for example.

Paul Krill

Paul Krill is editor at large at InfoWorld. Paul has been covering computer technology as a news and feature reporter for more than 35 years, including 30 years at InfoWorld. He has specialized in coverage of software development tools and technologies since the 1990s, and he continues to lead InfoWorld’s news coverage of software development platforms including Java and .NET and programming languages including JavaScript, TypeScript, PHP, Python, Ruby, Rust, and Go. Long trusted as a reporter who prioritizes accuracy, integrity, and the best interests of readers, Paul is sought out by technology companies and industry organizations who want to reach InfoWorld’s audience of software developers and other information technology professionals. Paul has won a “Best Technology News Coverage” award from IDG.

More from this author