A shared model format
Different frameworks save models in their own formats, which complicates deployment. ONNX is an open standard that represents a model as a graph of standard operators, so a model trained in one framework can run elsewhere.
The runtime
ONNX Runtime is an engine that loads an ONNX graph and executes it efficiently. Its key feature is execution providers: pluggable backends for different hardware such as CPUs, GPUs, or specialized accelerators.
- The runtime applies graph optimizations like fusion and constant folding.
- It then partitions the graph across available execution providers.
- Each provider runs the parts it supports best.
From training to deployment
Why teams use it
ONNX decouples the training framework from the serving stack. You can train freely and then deploy the same exported graph on many targets without rewriting the model, while the runtime handles backend specific optimization.
Key idea
ONNX is a portable model graph format and ONNX Runtime executes it across pluggable execution providers, separating training framework from deployment hardware.