Table Of Contents
This sample, sampleINT8API, performs INT8 inference without using the INT8 calibrator; using the user provided per activation tensor dynamic range. INT8 inference is available only on GPUs with compute capability 6.1 or 7.x and supports Image Classification ONNX models such as ResNet-50, VGG19, and MobileNet.
Specifically, this sample demonstrates how to:
nvinfer1::ITensor::setDynamicRange
to set per tensor dynamic rangenvinfer1::ILayer::setPrecision
to set computation precision of a layernvinfer1::ILayer::setOutputType
to set output tensor data type of a layerIn order to perform INT8 inference, you need to provide TensorRT with the dynamic range for each network tensor, including network input and output tensor. One way to choose the dynamic range is to use the TensorRT INT8 calibrator. But if you don't want to go that route (for example, let’s say you used quantization-aware training or you just want to use the min and max tensor values seen during training), you can skip the INT8 calibration and set custom per-network tensor dynamic ranges. This sample implements INT8 inference for the ONNX ResNet-50 model using per-network tensor dynamic ranges specified in an input file.
This sample uses the ONNX ResNet-50 model.
Specifically, this sample performs the following steps:
if (!builder->platformHasFastInt8()) return false;
Enable INT8 mode by setting the builder flag: builder->setInt8Mode(true);
You can choose not to provide the INT8 calibrator. builder->setInt8Calibrator(nullptr);
If you want to provide the calibrator, manual dynamic range will override calibration generate dynamic range/scale. See sampleINT8 on how to setup INT8 calibrator.
Optionally and for debugging purposes, the following flag configures the builder to choose type conforming layer implementation, if one exists.
For example, in the case of DataType::kINT8
, types are requested by setInt8Mode(true)
. Setting this flag ensures that only the conformant layer implementation (with kINT8
input and output types), are chosen even if a high performance non-conformat implementation is available. If no conformant layer exists, TensorRT will choose a non-conformant layer if available regardless of the setting for this flag.
builder->setStrictTypeConstraints(true);
Optional: This sample also showcases using layer precision APIs. Using these APIs, you can selectively choose to run the layer with user configurable precision and type constraints. It may not result in optimal inference performance, but can be helpful while debugging mixed precision inference.
Iterate through the network to per layer precision: ``` auto layer = network->getLayer(i); layer->setPrecision(nvinfer1::DataType::kINT8); ```
This gives the layer’s inputs and outputs a preferred type (for example, DataType::kINT8
). You can choose a different preferred type for an input or output of a layer using: ``` for (int j=0; j<layer->getNbOutputs(); ++j) { layer->setOutputType(j, nvinfer1::DataType::kFLOAT); } ```
Using layer precision APIs with builder->setStrictTypeConstraints(true)
set, ensures that the requested layer precisions are obeyed by the builder irrespective of the performance. If no implementation is available with request precision constraints, the builder will choose the fastest implementation irrespective of precision and type constraints. For more information on using mixed precision APIs, see Setting The Layer Precision Using C++.
After we configure the builder with INT8 mode and calibrator, we can build the engine similar to any FP32 engine.
ICudaEngine* engine = builder->buildCudaEngine(*network);
After the engine has been built, it can be used just like an FP32 engine. For example, inputs and outputs remain in 32-bit floating point.
buffers.copyInputToDeviceAsync(stream);
buffers.copyOutputToHostAsync(stream);
outputCorrect = verifyOutput(buffers);
This sample demonstrates how you can enable INT8 inference using the following mixed precision APIs.
ITensor::SetDynamicRange Set dynamic range for the tensor. Currently, only symmetric ranges are supported, therefore, the larger of the absolute values of the provided bounds is used.
ILayer::SetPrecision Set the computational precision of this layer. Setting the precision forces TensorRT to choose the implementations which run at this precision. If precision is not set, TensorRT will select the computational precision based on performance considerations and the flags specified to the builder.
ILayer::SetOutputType Set the output type of this layer. Setting the output type forces TensorRT to choose the implementations which generate output data with the given type. If the output type is not set, TensorRT will select the implementation based on performance considerations and the flags specified to the builder.
In addition to the model file and input image, you will need per tensor dynamic range stored in a text file along with the ImageNet label reference file.
The following required files are included in the package and are located in the data/int8_api
directory.
reference_labels.txt
The ImageNet reference label file.
resnet50_per_tensor_dynamic_range.txt
The ResNet-50 per tensor dynamic ranges file.
airliner.ppm
The image to be inferred.
wget https://s3.amazonaws.com/download.onnx/models/opset_9/resnet50.tar.gz
tar -xvzf resnet50.tar.gz
resnet50/model.onnx
to the data/int8_api/resnet50.onnx
directory.Compile this sample by running make in the <TensorRT root directory>/samples/sampleINT8API
directory. The binary named sample_int8_api
will be created in the <TensorRT root directory>/bin
directory.
``` cd <TensorRT root directory>/samples/sampleINT8API make ```
Where <TensorRT root directory>
is where you installed TensorRT.
Run the sample to perform INT8 inference on a classification network, for example, ResNet-50. ./sample_int8_api [-v or --verbose]
To run INT8 inference with your dynamic ranges: ``` ./sample_int8_api [–model=model_file] [–ranges=per_tensor_dynamic_range_file] [–image=image_file] [–reference=reference_file] [–data=/path/to/data/dir] [–useDLACore=<int>] [-v or –verbose] ```
Verify that the sample ran successfully. If the sample runs successfully you should see output similar to the following:
``` &&&& RUNNING TensorRT.sample_int8_api # ./sample_int8_api [I] Please follow README.md to generate missing input files. [I] Validating input parameters. Using following input files for inference. [I] Model File: ../../../../../../../../../data/samples/int8_api/resnet50.onnx [I] Image File: ../../../../../../../../../data/samples/int8_api/airliner.ppm [I] Reference File: ../../../../../../../../../data/samples/int8_api/reference_labels.txt [I] Dynamic Range File: ../../../../../../../../../data/samples/int8_api/resnet50_per_tensor_dynamic_range.txt [I] Building and running a INT8 GPU inference engine for ../../../../../../../../../data/samples/int8_api/resnet50.onnx [I] Setting Per Layer Computation Precision [I] Setting Per Tensor Dynamic Range [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors. [W] [TRT] Warning: no implementation of (Unnamed Layer* 173) [Fully Connected] obeys the requested constraints, using a higher precision type [W] [TRT] Warning: no implementation of (Unnamed Layer* 174) [Softmax] obeys the requested constraints, using a higher precision type [I] sampleINT8API result: Detected: [I] [1] space shuttle [I] [2] airliner [I] [3] warplane [I] [4] projectile [I] [5] wing &&&& PASSED TensorRT.sample_int8_api # ./sample_int8_api ```
This output shows that the sample ran successfully; PASSED
.
--help
optionsTo see the full list of available options and their descriptions, use the -h
or --help
command line option.
In order to use this sample with other model files with a custom configuration, perform the following steps:
Create a file called reference_labels.txt
.
Note: Ensure each line corresponds to a single imagenet label. You can download the imagenet 1000 class human readable labels from here. The reference label file contains only a single label name per line, for example, ‘0:'tench, Tinca tinca’is represented as
tench`.
<network_name>_per_tensor_dynamic_ranges.txt
.Before you can create the dynamic range file, you need to generate the tensor names by providing the dynamic range for each network tensor.
This sample provides an option to write names of the network tensors to a file, for example network_tensors.txt
. This file can then be used to generate the <network_name>_per_tensor_dynamic_ranges.txt
file in step 4-2 below. To generate the list of network tensors file, perform the following steps:
i. Write network tensors to a file: ``` ./sample_int8_api [–model=model_file] [–write_tensors] [–network_tensors_file=network_tensors.txt] [-v or –verbose] ```
ii. Run INT8 inference with user provided dynamic ranges: ``` ./sample_int8_api [–model=model_file] [–ranges=per_tensor_dynamic_range_file] [–image=image_file] [–reference=reference_file] [–data=/path/to/data/dir] [–useDLACore=<int>] [-v or –verbose] ```
sampleINT8API needs following files to build the network and run inference:
<network>.onnx
The model file which contains the network and trained weights.
Reference_labels.txt
Labels reference file i.e. ground truth ImageNet 1000 class mappings.
Per_tensor_dynamic_range.txt
Custom per tensor dynamic range file or you can simply override them by iterating through network layers.
Image_to_infer.ppm
PPM Image to run inference with.
Note: By default, the sample expects these files to be in either the data/samples/int8_api/
or data/int8_api/
directories. The list of default directories can be changed by adding one or more paths with --data=/new/path
as a command line argument.
To create the <network_name>_per_tensor_dynamic_ranges.txt
file, ensure each line corresponds to the tensor name and floating point dynamic range, for example <tensor_name> : <float dynamic range>
.
Tensor names generated in the network_tensors.txt
file (step 4-1) can be used here to represent <tensor_name>
. The dynamic range can either be obtained from training (by measuring the min
and max
value of activation tensors in each epoch) or from using custom post processing techniques (similar to TensorRT calibration). You can also choose to use a dummy per tensor dynamic range to run the sample.
Note: INT8 inference accuracy may reduce when dummy/random dynamic ranges are provided.
The following resources provide a deeper understanding how to perform inference in INT8:
INT8API:
Generate per tensor dynamic range:
Models:
Blogs:
Videos:
Documentation:
For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.
March 2019 This README.md
file was recreated, updated and reviewed.
There are no known issues in this sample.