Table Of Contents
The batchedNMSPlugin
implements a non-maximum suppression (NMS) step over boxes for object detection networks.
Non-maximum suppression is typically the universal step in object detection inference. This plugin is used after you’ve processed the bounding box prediction and object classification to get the final bounding boxes for objects.
With this plugin, you can incorporate the non-maximum suppression step during TensorRT inference. During inference, the neural network generates a fixed number of bounding boxes with box coordinates, identified class and confidence levels. Not all bounding boxes, but the most representative ones, have to be drawn on the original image.
Non-maximum suppression is the way to eliminate the boxes which have low confidence or do not have object in and keep the most representative ones. For example, the objects within an image might be covered by many boxes with different levels of confidence. The goal of the non-maximum suppression step is to find the most confident box for the object and remove all the less confident ones.
This plugin accelerates this non maximum suppression step during TensorRT inference on GPU.
The batchedNMSPlugin
takes two inputs, boxes input and scores input.
Boxes input The boxes input are of shape [batch_size, number_boxes, number_classes, number_box_parameters]
. The box location usually consists of four parameters such as [x_min, y_min, x_max, y_max]
. For example, if your model outputs 8732
bounding boxes given one image, there are 100
candidate classes, the shape of boxes input will be [8732, 100, 4]
.
Scores input The scores input are of shape [batch_size, number_boxes, number_classes]
. Each box has an array of probability for each candidate class.
The boxes input and scores input generates the following four outputs:
num_detections
The num_detections
input are of shape [batch_size, 1]
. The last dimension of size 1 is an INT32 scalar indicating the number of valid detections per batch item. It can be less than keepTopK
. Only the top num_detections[i]
entries in nmsed_boxes[i]
, nmsed_scores[i]
and nmsed_classes[i]
are valid.nmsed_boxes
A [batch_size, keepTopK, 4]
float32 tensor containing the coordinates of non-max suppressed boxes.nmsed_scores
A [batch_size, keepTopK]
float32 tensor containing the scores for the boxes.nmsed_classes
A [batch_size, keepTopK]
float32 tensor containing the classes for the boxes.The batchedNMSPlugin
has plugin creator class BatchedNMSPluginCreator
and plugin class BatchedNMSPlugin
.
The batchedNMSPlugin
is created using BatchedNMSPluginCreator
with NMSParameters
typed parameters. The NMSParameters
data structure is listed as follows and is defined in the NvInferPlugin.h header file.
Type | Parameter | Description |
---|---|---|
bool | shareLocation | If set to true , the boxes input are shared across all classes. If set to false , the boxes input should account for per-class box data. |
int | backgroundLabelId | The label ID for the background class. If there is no background class, set it to -1 . |
int | numClasses | The number of classes in the network. |
int | topK | The number of bounding boxes to be fed into the NMS step. |
int | keepTopK | The number of total bounding boxes to be kept per-image after the NMS step. Should be less than or equal to the topK value. |
float | scoreThreshold | The scalar threshold for score (low scoring boxes are removed). |
float | iouThreshold | The scalar threshold for IOU (new boxes that have high IOU overlap with previously selected boxes are removed). |
bool | isNormalized | Set to false if the box coordinates are not normalized, meaning they are not in the range [0,1] . Defaults to true . |
bool | clipBoxes | Forcibly restrict bounding boxes to the normalized range [0,1] . Only applicable if isNormalized is also true . Defaults to true . |
The NMS algorithm used in this particular plugin first sorts the bounding boxes indices by the score for each class, then sorts the bounding boxes by the updated scores, and finally collects the desired number of bounding boxes with the highest scores.
It is mainly accelerated using the nmsInference
kernel defined in the batchedNMSInference.cu
file.
Specifically, the NMS algorithm:
scoreThreshold
are discarded by setting their indices to -1
and their scores to 0
. This is using the sortScoresPerClass
kernel defined in the sortScoresPerClass.cu
file.iouThreshold
is suppressed by setting their indices to -1
and their scores to 0
. Then all the less confident bounding boxes were suppressed for each class. This is using the allClassNMS
kernel defined in the allClassNMS.cu
file.0
. This is using the sortScoresPerImage
kernel defined in the sortScoresPerImage.cu
file.keepTopK
, of bounding box indices with the highest scores from the top of the sorted array, their bounding box coordinates, and their object classification information. This is using the gatherNMSOutputs
kernel defined in the gatherNMSOutputs.cu
file.The following resources provide a deeper understanding of the batchedNMSPlugin
plugin:
Networks
Documentation
For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.
May 2019 This is the first release of this README.md
file.
cub::DeviceSegmentedRadixSort::SortPairsDescending
with cuda-memcheck --tool racecheck
, it will not work correctly.