Dates are in YYYY-MM-DD format.
TrtRunner
can now optionally accept a context
directly instead of an engine
.basic_compare_func
will now show mismatched indices in addition to mismatched values.surgeon
subtool, insert
, which can insert new nodes into an ONNX model.surgeon
subtool, sanitize
, which can remove unused nodes and fold constants in an ONNX model.--load-inputs
and --save-inputs
to provide a mechanism to supply custom input data on the command line.func.invoke()
, a function that calls a provided callable. This can be useful to make it more obvious that a loader is being immediately evaluated. For example: EngineFromNetwork(...)()
vs. func.invoke(EngineFromNetwork(...))
basic_compare_func
.--atol
and --rtol
command-line options.inspect results
to inspect data
since it can now also be used to inspect input data, not just results.Comparator.compare_accuracy
now supports comparing a single runner against itself.prepare
and operate
as they were difficult to maintain and not very useful.IBuilderConfig
not being properly freed in the EngineFromNetwork
loader.inspect model
where dim_param
s in ONNX models would show up as -1
.TensorMetadata
can now be strings to indicate dynamic dimensions.TRT_LOGGER
is now exported under polygraphy.backend.trt
surgeon extract
where ONNX models using dim_param
would be rejected.--input-shapes
alias for the --inputs
option in run
to better reflect its purpose.inspect model
will no longer show dtype
/shape
as None
if the information is not present in the model. Instead, these are now omitted.basic_compare_func
TrtRunner
would use the wrong shapes for empty tensor outputs .Calibrator
would not re-check the cache when reset()
-v
/--version
flag to polygraphy
inspect model
, to control whether to show weights in the model.-s
/--show-values
option to inspect results
to display output values.--top-k
flag to run
, which will apply a Top-K before comparing outputs.exclude_outputs
to ModifyOnnx
and ModifyNetwork
--onnx-exclude-outputs
and --trt-exclude-outputs
to selectively unmark outputs.inspect model
for ONNX models containing nodes with Tensor attributes.DeviceBuffer.copy_from
would segfault in rare cases.DataLoader
would use a shape provided by the user even for static shapes in the model.DataLoader
would incorrectly report certain tensors as shape tensors.DataLoaderCache
would stop checking the cache after the first miss.extend
decorator, which makes it easier to extend existing loaders.Comparator.compare_accuracy
will now display an accuracy summary after processing all iterations.CreateNetwork
loader to create new TensorRT networks--network-api
option that works with --gen
to allow manually defining a TensorRT network.Calibrator
can now accept a file-like object for cache
instead of just a file path.EngineFromBytes
will now call trt.init_libnvinfer_plugins
before attempting to deserialize the engine.int_min
== int_max
when bounding input datadim_param
for dynamic dimensions.CreateConfig
now accepts a strict_types
argument.polygraphy
binary, which includes several toolsprecision
, which can be used to figure out what layers to run in higher precision in TensorRT to achieve the desired accuracy.bisect
subtool that does binary searchlinear
subtool that does a linear searchworst-first
subtool that marks the layers that introduce the most error first.inspect
to inspect supported filesmodel
which displays information about models.results
which displays information about saved RunResults
subprocess_polling_interval
to Comparator.run()
, as this is still required in certain rare cases.OnnxFromTfGraph
, and can be disabled by setting optimize=False
in the constructor.is_active
property, which indicates whether the runner is currently activated.surgeon
, which can be used to modify ONNX models more easily than using ONNX-GraphSurgeon.prepare
and operate
which can be used to modify an ONNX model using a JSON configuration.extract
which can extract ONNX subgraphs with a single command.--onnx-outputs
and --trt-outputs
to set outputs in the corresponding loadersLoadPlugins
, that can wrap any other loader, and load pluginsEngineFromNetwork
will no longer free the the builder, network and parser if they are provided directly (as opposed to via a loader).TrtRunner
will no longer free the the engine if it is provided directly (as opposed to via a loader).compare_func
in Comparator.compare_accuracy
now accepts a function that returns anything convertible to a boolean, rather than requiring a boolean.basic_compare_func
now will return information about required tolerances after Comparator.compare_accuracy
.Calibrator
can now be configured to inherit from a different TensorRT calibrator base class.TrtLegacyRunner
no longer depends on pycuda
TrtRunner
will now only reset context shapes if the shapes changed. This should improve performance.DataLoader
now takes int_range
and float_range
parameters, so min/max can be provided more concisely.Loaders
and Runner
were renamed to better reflect their purpose, and to improve readability.warm_up_runs
to warm_up
.Calibrator
's data_loader
parameter now accepts any generator or iterable instead of requiring a special type.Comparator.run
's data_loader
parameter now accepts any generator or iterable instead of requiring a special type.DataLoader
can now be used as an iterable, and its iteration length can be controlled via the iterations
parameter.--input-shape
to --inputs
--min-shape
/--opt-shape
/--max-shape
to --trt-min-shapes
/--trt-opt-shapes
/--trt-max-shapes
DataLoader
now accepts an input_metadata
parameter which can be used to override shapes and data types.layerwise
and outputs
functionality into separate Modify
loaders.Save
loaders.--read
options to --load
, and --write
to --save
--read-outputs
/--write-outputs
to --load-results
/--save-results
Calibrator
no longer requires input_metadata
to be set if the data loader does not need itTfRunner
now uses a CreateConfig
loader to supply configuration.TfRunner
and OnnxrtRunner
now take a BuildSession
, so that custom sessions can be used.Comparator.run()
and Calibrator
. Instead these now iterate the provided data loader until it runs out of data.--load-engine
option from polygraphy
. Engines can now be provided as models directly, e.g. polygraphy run example.engine --trt
polygraphy_exec
and polygraphy_gen
were removed. They are superseded by the run
subtool of polygraphy
.--layerwise
and layerwise
options have been removed. Layerwise behavior is now possible with outputs=constants.MARK_ALL
or --<framework>-outputs mark all
Comparator.validate
that would cause it not to correctly display non-finite values.Calibrator
will now warn if a cache exists but is emptyDataLoader
will now used a fixed seed value unless otherwise specified. This ensures consistent run-to-run behavior.find_output_func
will no longer compare outputs whose names don't match if there is another output that does match.CreateConfig
basic_compare_func
now accepts a find_output_func
parameter, allowing users to control which outputs are compared between results.--load-outputs
argument can now accept multiple different files. Outputs from each of these will be read in order.RunResults
class which replaces the OrderedDict
that Comparator.run
previously returned (structure is unchanged).layerwise
mode will no longer mark constants as outputs.compare_func
in Comparator.compare_accuracy
will now always iterate over the output names in the first IterationResult
and attempt to find them in the second. The order of the IterationResult
s provided to this function can be modified either by setting comparisons
in Comparator.compare_accuracy
, or changing the order of runners in Comparator.run
polygraphy_gen
output formattingRunResult
to IterationResult
to better reflect its purpose.graphsurgeon
is no longer a dependency of Polygraphypolygraphy_exec
/polygraphy_gen
are now set prior to any logging output.bytes
objects sent over the queue when using subprocessesOnnxExtWeightsNetworkLoader
to support loading ONNX models with externally stored weights into TensorRT.TensorMetadata
class to replace dictionaries that were used across Polygraphy.CaffeNetworkLoader
for the TrtLegacyRunner
polygraphy_exec
and polygraphy_gen
will no longer use subprocesses by default. To revert to the old behavior, the --use-subprocess
flag must now be explicitly provided.SerializedEngineLoader
now accepts a buffer_loader
, so that a function that loads a serialized engine may be provided instead of the serialized engine itself.OnnxFromTfGraph
has been updated to 11
polygraphy_exec
and polygraphy_gen
now correctly handle cases where no model file is providedPolygraphyException
class to serve as a base class for exceptions raised by Polygraphy.ConfigLoader
now accepts a list of Profile
s to support multiple optimization profiles.outputs
argument from TfRunner to the tensorflow loaders.ctypes
wrapper around the CUDA runtime library, accessible in util/cuda.py
TrtRunner
no longer depends on pycuda
, and instead uses the included CUDA wrapper.basic_compare_func
will now preserve output ordering in the results.EngineFromNetwork
compatible with TensorRT 7.0--timestamp
and --line-info
options to polygraphy_exec
to enable logging of timestamp and line numbers respectively.--no-letter
option to disable severity letter prefixes in log messagesregister_callback
to Logger, which registers a callback that will be called whenever the severity changes.Logger.verbosity()
which returns a context manager that can be used to temporarily change logging severity.--model-type
in polygraphy_exec
: keras
, ckpt
, renamed tf
to frozen
ConfigLoader
which can be passed to EngineFromNetwork
to customize the build configuration prior to building.timestamp
/line_info
properties respectively to True
.colored
module to provide colored outputpolygraphy_exec
now runs runners in the order in which they were specified._runner
suffixes and shortening framework names (e.g. tensorflow_runner
-> tf
)runners
submodule has been renamed to backend
TrtRunner
has been renamed to TrtLegacyRunner
TrtRunnerV2
has been renamed to TrtRunner
polygraphy_gen
is now at parity with polygraphy_exec
--tftrt
as a separate runner in polygraphy_exec
- instead it is now an option for the --tf
runner.--tftrt-gpu-memory-fraction
and renamed --tf-gpu-memory-fraction
to --gpu-memory-fraction
in polygraphy_exec
--tfonnx
, and instead adds this functionality in --onnxrt
when using a TensorFlow model in polygraphy_exec
Experimental
argument section in polygraphy_exec
. All functionality has now been integrated into non-experimental arguments.preprocess_network
argument from EngineFromNetwork
. This functionality can be achieved by wrapping the network loaders instead.Comparator.run
will now forcefully terminate the subprocess if it does not exit on its own.BaseRunner
so that runners can now implement activate()
/deactivate()
instead of __enter__()
/__exit__()
polygraphy_exec
now defaults to running just a single iteration of inference.--accuracy
flag has been removed from polygraphy_exec
, as this is now the default behavior.try_match_shape
tf32
parameter as well as --tf32
flag for TensorRT.dim_param
in ONNX.fp16_mode
and int8_mode
parameters have been renamed to fp16
and int8
respectively.polygraphy_exec
will now use the runtime shapes specified rather than always using OPT
shapes from the TensorRT profile.DataLoaderCache
start_index
and end_index
to Comparator.run
to make it easy to skip over inputs from the data loader.CompareFunc
to provide built-in comparison functions.PostprocessFunc
to provide built-in post-processing functions.Comparator.compare_accuracy
now returns an AccuracyResult
object, which contains much more information about the results of the comparisons.percentage()
function to AccuracyResult
to provide an easy way to figure out the percentage of passed iterations.RunInfo
with IterationResult
. The latter only stores information about a single iteration for a single runner.compare_func
in Comparator.compare_accuracy
is now a Callable(IterationResult, IterationResult) -> Dict[str, bool]
warm_up_runs
now defaults to 0
, and end_index
to 1
CompareFunc.basic_compare_func
use_subprocess
now defaults to False
in Comparator.run()
(still defaults to True
in polygraphy_exec
).Calibrator
now takes a start_index
and end_index
argument instead of max_items
.Comparator.compare
function since Comparator.compare_accuracy
includes all of its functionality.iterations
in Comparator.run
has been removed and replaced by start_index
and end_index
subprocess_polling_interval
argument, as Comparator
can now properly detect when the subprocess terminates.Comparator.run()
will no longer hang if there is a segfault in the subprocess.--int-min
, --int-max
, --float-min
, and --float-max
arguments to polygraphy_exec
--explicit-precision
option to polygraphy_exec
to enable QAT models in TRT.--load-outputs
or --save-outputs
is specified to polygraphy_exec
, seed
will default to 1
to ensure consistent inputs across runs.--calibration-cache
option to polygraphy_exec
to enable supplying a calibration cache--no-color
option to disable color logging.GraphOptimizerLoader
for freezing TensorFlow graphs and --freeze-graph
option to polygraphy_exec
.--load-outputs
and --save-outputs
to polygraphy_exec
for comparing across executions.KerasLoader
for loading models stored in hdf5
format.GraphOptimizerLoader
for TensorFlow graphs.Calibrator
so that it will now use the opt dimension of a profile for networks with dynamic shapes.Loaders
for easier UFF debugging.Calibrator
will no longer allocate buffers if a calibration cache was provided.polygraphy_gen
BaseRunner
methods.last_inference_time()
to BaseRunner
so that infer()
now only needs to return outputs.Calibrator
for int8 calibration, along with additional parameters to EngineFromNetwork
DataLoaderCache
will now warn loudly when a set of inputs needs to be regenerated.Comparator
run()
function.save_*
options into loaders rather than runners.BaseDataLoader.next()
to take index as an argument. This way, inputs can be reliably repeated across runners.layerwise
parameters into loaders rather than runners.Loader
s are now interchangeable with Python Callable
sDataLoader
s are now interchangeable with Python Callable
sDataLoader
no longer generates all True
values for boolean types.polygraphy_gen
DataLoaderCache
is now sent over the queue when runners are run in subprocesses. This resolves an issue where the cache was not being updated correctly.Comparator
now updates runners correctly when using a subprocess.--no-fold-constant
option to prevent OnnxFromTfGraph
from doing constant folding in the TensorFlow graph.polygraphy_gen
script that enables generation of template Python scripts for running Polygraphy.GraphSurgeon
.name
parameter to CheckpointLoader
in case the checkpoint does not include a checkpoint
file.TFTRTLoader
now accepts any kind of TensorFlow Graph loaderTrtRunner
Buffers
so that no-op reshapes (no reallocation) are handled correctly.check_finite
, check_nan
, and fail_fast
options to Comparator.validate()
Buffers
implementation for TrtRunner
- eliminates an unnecessary copy that was happening on the host input.misc.find_in_dict()
TrtRunner
will no longer call context
's shape setting functions on non-dynamic inputs.DataLoader
to handle scalars correctly, adds several tests.TrtRunner
, e.g. create_network
function to simplify TensorRT's network flags.EngineFromNetwork
will now mark network outputs when layerwise=True
bool
outputs in Comparator
OnnxEngineLoader
with OnnxNetworkLoader
and EngineFromNetwork
. This allows for more flexibility in building engines from TensorRT networks.allow_growth
option to TfRunner to work around CUDNN_STATUS_INTERNAL_ERROR
. When allow_growth
is enabled, the error disappears.DataLoaderCache
will now attempt to permute inputs in cases where shapes do not match exactly (e.g. NCHW vs NHWC inputs).polygraphy_exec
which caused it to ignore user-defined profiles.int8
and explicit precision mode in TrtRunner
preprocess_network
parameter to OnnxEngineLoader
so that the network can be modified before it is used for building.TrtRunner
will now attempt to generate sane default shapes in cases with dynamic shapes where no profiles are provided.DataLoader
no longer overrides static shapes in the model, but issues a warning if an override is requested.DataLoader
now accepts shape tensor inputs in its default_shapes
parameter.Comparator
can now catch segfaults in runners properly.DataLoader
to be able to specify input boundsDataLoaderCache
subprocess_polling_interval
is now 30 seconds.Comparator
now attempts to partially match output names when no exact matches are found.subprocess_timeout
parameter to Comparator.run
to prevent hangs when a subprocess does not terminate.subprocess_polling_interval
parameter to allow the process to be polled so that failing processes can be terminated before the full subprocess_timeout
.OnnxEngineLoader
now accepts an onnx_loader
for better flexibility in loading models.polygraphy_exec
now supports running TF models in TRT via the tf2onnx converter.TrtLegacyRunner
now only supports UFF models.BaseModelLoader
that can be used to load models. This allows for reuse of existing runners with different import paths. For example, OnnxrtRunner
can be used with OnnxFromTfGraph
in order to run a TensorFlow frozen graph via ONNX Runtime.ModelLoader
s for TfRunner
, including a frozen model loader, checkpoint loader, and TF-TRT loader.OnnxFromTfGraph
now accepts a TensorFlow ModelLoader to support a wider variety of input formats.TrtLegacyRunner
to use get_input_metadata
API, so it is usable for UFF models.TrtRunner
will no longer mark layers within the loop body as network outputs in layerwise
mode.DataLoaderCache
now falls back to reusing inputs based on order if names do not match exactly.DataLoader
now accepts a default_shapes
parameter to override dynamic shapes.get_input_metadata
API to BaseRunner. Overhauls runners so they no longer need to handle dynamic input shapes individually.DataLoader
class which can be used to feed data to the Comparator.DataLoaderCache
so that the data loader does not have to load inputs multiple times for each runner.Comparator.compare_accuracy
now fails if no outputs were compared.TrtLegacyRunner
. You should use TrtRunner
for ONNX models instead.python2
support.polygraphy_exec
when using legacy TrtLegacyRunner
TrtRunner
for cases with multiple outputsComparator
process. This is because Pipe
s and Queue
s can only send objects smaller than 2GB.--fail-fast
option to polygraphy_exec
and corresponding fail_fast
option to Comparator.compare()
. Useful for determining the first layer at which two models diverge.TrtRunner
that can be used to run TRT networks with dynamic shapes. Currently only supports ONNX.__enter__
is called. This greatly simplifies much of the logic in several runners.RunInfo
no longer contains data about the inputs used.TFOnnxrtRunner
now accepts an opset option when converting graphs to ONNX._runner
to disambiguate them from system packages.--uff-order
option in case the automatically determined order is wrong.--build-only
option to polygraphy_exec
check_shapes
is disabled.TFOnnxrtRunner
and --tfonnx
option to polygraphy_exec
OnnxrtRunner
and moves TFOnnxrtRunner
into onnx_runner.py
.--save-onnx
option for OnnxrtRunner
--onnx
polygraphy_exec
option to onnxtf
to disambiguate from --onnxrt
CNTKRunner
and --cntk
option to polygraphy_exec
--tf-outputs
argument to polygraphy_exec
--plugins
option to polygraphy_exec
for loading TRT plugins.polygraphy_exec
.polygraphy_exec
will no longer fail if the extension for the model file is unrecognized.fp16_mode
option to TfRunner for TF-TRT.polygraphy_exec
now exits when unknown arguments are encounteredpolygraphy_exec
now emits warnings when unknown command line parameters are used.