The ONNX Runtime

A shared model format

Different frameworks save models in their own formats, which complicates deployment. ONNX is an open standard that represents a model as a graph of standard operators, so a model trained in one framework can run elsewhere.

The runtime

ONNX Runtime is an engine that loads an ONNX graph and executes it efficiently. Its key feature is execution providers: pluggable backends for different hardware such as CPUs, GPUs, or specialized accelerators.

The runtime applies graph optimizations like fusion and constant folding.
It then partitions the graph across available execution providers.
Each provider runs the parts it supports best.

From training to deployment

Why teams use it

ONNX decouples the training framework from the serving stack. You can train freely and then deploy the same exported graph on many targets without rewriting the model, while the runtime handles backend specific optimization.

Key idea

ONNX is a portable model graph format and ONNX Runtime executes it across pluggable execution providers, separating training framework from deployment hardware.

A shared model format

The runtime

From training to deployment

Why teams use it

Key idea

Check yourself