IMPORTANT: The Python API is still not completely stable, and minor but breaking changes may be made in future versions.
The Polygraphy API consists broadly of two major components: Backend
s and the Comparator
.
NOTE: To help you get started with the API, you can use the run
tool to auto-generate template scripts that use the Polygraphy API.
A Polygraphy backend provides an interface for a deep learning framework. Backends are comprised of two components: Loaders and Runners.
A Loader
is used to load models for runners (see BaseLoadModel
).
A Loader
can be any Callable
that takes no arguments and returns a model. "Model" here is a generic term, and the specifics depend on the runner for which the loader has been implemented.
Moreover, existing Loader
s can be composed for more advanced behaviors. For example, we can implement a conversion like TensorFlow Frozen Model -> ONNX -> TensorRT Network -> TensorRT Engine
:
We can now provide build_engine
directly to a TrtRunner
.
A Runner
uses a loader to load a model and can then run inference (see BaseRunner
).
IMPORTANT: Runners may reuse their output buffers. Thus, if you need to save outputs from multiple inferences, you should make a copy of the outputs with copy.deepcopy(outputs)
.
To use a runner, you just need to activate it, and then call infer()
. Note that activating a runner can be very expensive, so you should minimize the number of times you activate a runner - ideally do not do this more than once.
It is recommended to use a context manager to activate and deactivate the runner rather than calling the functions manually:
Generally, you do not need to write custom runners unless you want to support a new backend.
In case you do, in the simplest case, you only need to implement two functions:
infer_impl
: Accepts a dictionary of numpy buffers, runs inference, and finally returns a dictionary containing the outputs.get_input_metadata
: Returns a TensorMetadata
mapping input names to their shapes and data types. You may use None
to indicate dynamic dimensions.For more advanced runners, where some setup is required, you may also need to implement the activate_impl()
and deactivate_impl()
functions.
For example, in the TrtRunner
, engines are created in activate_impl()
and destroyed in deactivate_impl()
. Importantly, the GPU is not used at all until these functions have been called (notice, for example, that in the TrtRunner
, the CUDA runtime library is only loaded in the activate_impl()
function). This allows the Comparator
to optionally provide each runner with exclusive access to the GPU, to prevent any interference between runners.
The Comparator
is used to run inference for runners, and then compare accuracy (see Comparator.py). This process is divided into two phases:
This function accepts a list of runners, and returns a
RunResults` object (see Comparator.py) containing information about the outputs of each run. It also accepts an optional data_loader
argument to control the input data. If not provided, it will use the default data loader. Comparator.run()
continues until inputs from the data loader are exhausted. This function accepts the results returned by
Comparator.run` and compares them between runners.A data loader is used by the Comparator
to load input data to feed to each runner (see DataLoader.py).
A data loader can be any generator or iterable that yields a dictionary of input buffers. In the simplest case, this can just be a list
of dict
s.
In case you don't know details about the inputs ahead of time, you can access the input_metadata
property in your data loader, which will be set to an TensorMetadata
instance by the Comparator.
NOTE: Polygraphy provides a default DataLoader
class that uses numpy to generate random input buffers. The input data can be bounded via parameters to the constructor.
IMPORTANT: Data loaders are designed for scenarios where you need to compare a small number of inputs across multiple runners. It is not a good idea to use a custom data loader to validate a model with an entire dataset! Instead, runners should be used directly for such cases (see the example).
Now that you know the basic components of Polygraphy, let's take a look at how they fit together.
In this example, we will write a script that:
[0, 2]
You can find complete code examples that use the Polygraphy Python API here.
In order to enable PyTorch, you need to provide three things to the PytRunner
:
BaseLoadPyt
: In the simplest case, this can be a callable that returns a torch.nn.Module
.input_metadata
: A TensorMetadata
describing the inputs of the model. This maps input names to their shapes and data types. As with other runners, None
may be used to indicate dynamic dimensions.
NOTE: Other runners are able to automatically determine input metadata by inspecting the model definition, but because of the way PyTorch is implemented, it is difficult to write a generic function to determine model inputs from a torch.nn.Module
.
output_names
: A list of output names. This is used by the Comparator
to match PytRunner
outputs to those of other runners.