Serdar Yegulalp
Senior Writer

OpenAI debuts Python-based Triton for GPU-powered machine learning

news
Jul 29, 20212 mins

Triton uses Python’s syntax to compile to GPU-native code, without the complexities of GPU programming.

OpenAI, the nonprofit venture whose professed mission is the ethical advancement of AI, has released the first version of the Triton language, an open source project that allows researchers to write GPU-powered deep learning projects without needing to know the intricacies of GPU programming for machine learning.

Triton 1.0 uses Python (3.6 and up) as its base. The developer writes code in Python using Triton’s libraries, which are then JIT-compiled to run on the GPU. This allows integration with the rest of the Python ecosystem, currently the biggest destination for developing machine learning solutions. It also allows leveraging the Python language itself, instead of reinventing the wheel by developing a new domain-specific language.

Triton’s libraries provide a set of primitives that, reminiscent of NumPy, provide a variety of matrix operations, for instance, or functions that perform reductions on arrays according to some criterion. The user combines these primitives in their own code, adding the @triton.jit decorator compiled to run on the GPU. In this sense Triton also resembles Numba, the project that allows numerically intensive Python code to be JIT-compiled to machine-native assembly for speed.

Simple examples of Triton at work include a vector addition kernel and a fused softmax operation. The latter example, it’s claimed, can run many times faster than the native PyTorch fused softmax for operations that can be done entirely in GPU memory.

Triton is a young project and currently available for Linux only. Its documentation is still minimal, so early-adopting developers may have to examine the source and examples closely. For instance, the triton.autotune function, which can be used to define parameters for optimizing JIT compilation of a function, is not yet documented in the Python API section for the library. However, triton.autotune is demonstrated in Triton’s matrix multiplication example.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author