Nvidia’s new TensorRT speeds machine learning predictions

news

Jun 27, 20172 mins

Serving predictions from a GPU is also more power-efficient and delivers results with lower latency, Nvidia claims

Nvidia has released a new version of TensorRT, a runtime system for serving inferences using deep learning models through Nvidia’s own GPUs.

Inferences, or predictions made from a trained model, can be served from either CPUs or GPUs. Serving inferences from GPUs is part of Nvidia’s strategy to get greater adoption of its processors, countering what AMD is doing to break Nvidia’s stranglehold on the machine learning GPU market.

Nvidia claims the GPU-based TensorRT is better across the board for inferencing than CPU-only approaches. One of Nvidia’s proffered benchmarks, the AlexNet image classification test under the Caffe framework, claims TensorRT to be 42 times faster than a CPU-only version of the same test — 16,041 images per second vs. 374—when run on Nvidia’s Tesla P40 processor. (Always take industry benchmarks with a grain of salt.)

Serving predictions from a GPU is also more power-efficient and delivers results with lower latency, Nvidia claims.

TensorRT doesn’t work with anything other than Nvidia’s own GPU lineup, and is a proprietary, closed-source offering. AMD, by contrast, has been promising a more open-ended approach to how its GPUs can be used for machine learning applications, by way of the ROCm open source hardware-independent library for accelerating machine learning.

Machine LearningTechnology IndustryData Management

by Serdar Yegulalp

Senior Writer

Follow Serdar Yegulalp on X

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

Show me more

Topics

About

Policies

Our Network

More

Nvidia’s new TensorRT speeds machine learning predictions

Serving predictions from a GPU is also more power-efficient and delivers results with lower latency, Nvidia claims

More from this author

I ran Qwen3.5 locally instead of Claude Code. Here’s what happened.

Migrating Python to Rust with Claude: What could go wrong?

First look: Electrobun for TypeScript-powered desktop apps

What I learned using Claude Sonnet to migrate Python to Rust

The best new features in MariaDB

Python’s popularity slip: Here’s what we know

What is Docker? The spark for the container revolution

First look: Run LLMs locally with LM Studio

Show me more

How to land a software development job in an AI-focused world

The agent security mess

OpenAI’s desktop superapp: The end of ChatGPT as we know it?

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)