Classes | |
class | HostDeviceMem |
Public Member Functions | |
def | __init__ (self, network_loader=None, max_workspace_size=None, max_batch_size=None, fp16=None, tf32=None, load_engine=None, save_engine=None, layerwise=False, plugins=[], name=None) |
def | activate_impl (self) |
def | get_input_metadata (self) |
def | deactivate_impl (self) |
def | infer_impl (self, feed_dict) |
def | last_inference_time (self) |
def | __enter__ (self) |
def | __exit__ (self, exc_type, exc_value, traceback) |
def | activate (self) |
def | infer_impl (self) |
def | infer (self, feed_dict) |
def | deactivate (self) |
Public Attributes | |
network_loader | |
max_workspace_size | |
fp16 | |
tf32 | |
load_engine | |
engine_path | |
layerwise | |
max_batch_size | |
engine | |
context | |
stream | |
inference_time | |
name | |
is_active | |
Static Public Attributes | |
RUNNER_COUNTS = defaultdict(int) | |
A runner that can perform inference on a single TensorRT engine.
def polygraphy.backend.trt_legacy.TrtLegacyRunner.__init__ | ( | self, | |
network_loader = None , |
|||
max_workspace_size = None , |
|||
max_batch_size = None , |
|||
fp16 = None , |
|||
tf32 = None , |
|||
load_engine = None , |
|||
save_engine = None , |
|||
layerwise = False , |
|||
plugins = [] , |
|||
name = None |
|||
) |
Creates a runner that manages a single TensorRT engine. network_loader (BaseModelLoader): A loader that returns a TRT builder, network, parser and input shapes. max_workspace_size (int): The maximum workspace size. max_batch_size (int): The maximum batch size. fp16 (bool): Whether to run in fp16 mode layerwise (bool): Whether to retrieve the outputs of every layer in the network. name (str): The human-readable name prefix to use for this runner. A runner count and timestamp will be appended to this prefix.
def polygraphy.backend.trt_legacy.TrtLegacyRunner.activate_impl | ( | self | ) |
Vars: engine (trt.ICudaEngine): The engine tracked by this runner. The TrtLegacyRunner OWNS the engine it manages, and therefore is responsible for it's destruction. Do not free the engine outside of the runner, or it will result in a double free. context (trt.IExecutionContext): The context used for inference. input_buffers (Dict[str, TrtLegacyRunner.HostDeviceMem]): A mapping of binding names to HostDeviceMem objects for input buffers. output_buffers (Dict[str, TrtLegacyRunner.HostDeviceMem]): A mapping of binding names to HostDeviceMem objects for output buffers. bindings (List[int]): A list of device pointers for engine bindings. stream (cuda.Stream): The CUDA stream that this runner will use for inference.
Reimplemented from polygraphy.backend.base.runner.BaseRunner.
def polygraphy.backend.trt_legacy.TrtLegacyRunner.get_input_metadata | ( | self | ) |
Returns information about the inputs of the model. Shapes here may include dynamic dimensions, represented by ``None``. Must be called only after activate() and before deactivate(). Returns: TensorMetadata: Input names, shapes, and data types.
Reimplemented from polygraphy.backend.base.runner.BaseRunner.
def polygraphy.backend.trt_legacy.TrtLegacyRunner.deactivate_impl | ( | self | ) |
Implementation for runner deactivation. Derived classes should override this function rather than ``deactivate()``.
Reimplemented from polygraphy.backend.base.runner.BaseRunner.
def polygraphy.backend.trt_legacy.TrtLegacyRunner.infer_impl | ( | self, | |
feed_dict | |||
) |
|
inherited |
Returns the total inference time required during the last call to ``infer()``. Returns: float: The time in seconds, or None if runtime was not measured by the runner.
|
inherited |
|
inherited |
|
inherited |
Activate the runner for inference. This may involve allocating GPU buffers, for example.
|
inherited |
Implementation for runner inference. Derived classes should override this function rather than ``infer()``
|
inherited |
Runs inference using the provided feed_dict. Args: feed_dict (OrderedDict[str, numpy.ndarray]): A mapping of input tensor names to corresponding input NumPy arrays. Returns: OrderedDict[str, numpy.ndarray]: A mapping of output tensor names to their corresponding NumPy arrays. IMPORTANT: Runners may reuse these output buffers. Thus, if you need to save outputs from multiple inferences, you should make a copy with ``copy.copy(outputs)``.
|
inherited |
Deactivate the runner.
polygraphy.backend.trt_legacy.TrtLegacyRunner.network_loader |
polygraphy.backend.trt_legacy.TrtLegacyRunner.max_workspace_size |
polygraphy.backend.trt_legacy.TrtLegacyRunner.fp16 |
polygraphy.backend.trt_legacy.TrtLegacyRunner.tf32 |
polygraphy.backend.trt_legacy.TrtLegacyRunner.load_engine |
polygraphy.backend.trt_legacy.TrtLegacyRunner.engine_path |
polygraphy.backend.trt_legacy.TrtLegacyRunner.layerwise |
polygraphy.backend.trt_legacy.TrtLegacyRunner.max_batch_size |
polygraphy.backend.trt_legacy.TrtLegacyRunner.engine |
polygraphy.backend.trt_legacy.TrtLegacyRunner.context |
polygraphy.backend.trt_legacy.TrtLegacyRunner.stream |
polygraphy.backend.trt_legacy.TrtLegacyRunner.inference_time |
|
staticinherited |
|
inherited |
|
inherited |