Table Of Contents
The plugin performs the following two tasks:
The embLayerNormPlugin
takes three inputs; token_id
, segmend_id
, and input_mask
.
token_id
An input sequence containing token ids. token_id is an int32
tensor with shape [S, B]
where S
is the sequence length and B
is the batch size. Tokens typically identify words or word pieces that were obtained by preprocessing the input text.
segment_id
An input sequence containing segment ids. segment_id is an int32
tensor with shape [S, B]
where S
is the sequence length and B
is the batch size. The segment id is used to distinguish between different parts of the input sequence that might serve different purposes. E.g. in a squad task, the input sequence might consist of a segment representing the knowledge base (i.e. a paragraph of text) and a segment representing the question.
input_mask
input_mask is an int32
tensor with shape [S, B]
where S
is the sequence length and B
is the batch size. The input mask denotes valid elements in a sequence that was padded to the sequence length S
.
The embLayerNormPlugin
generates the following two outputs:
embedded_input
embedded_input is an floating point tensor with shape [S, B, E]
where S
is sequence length, B
is batch size, and E
is hidden size. The final output embedding is the sum of embeddings for the token, the segment and the position in the sequence.
maskIdx
embedded_input is an int32
tensor with shape [B,]
where B
is batch size. The maskIdx is a more compact representation of the input mask, consisting of the number of valid elements, assuming that the original mask was contiguous.
embLayerNormPlugin
has plugin creator class EmbLayerNormPluginDynamicCreator
and plugin class CustomEmbLayerNormPluginDynamic
.
The parameters are defined below and consists of the following attributes:
Type | Parameter | Version | Description |
---|---|---|---|
int | output_fp16 | 1, 2 | Integer encoding the DataType (0: FP32, 1: FP16) |
int | full_mask | 1 | Whether to output the full mask that works with the specialized multi-head-attention plugin kernels (this is deprecated, please use mha_type_id) |
int | mha_type_id | 1 | Integer encoding the multi-head-attention plugin DataType (0: FP32, 1: FP16, 2: INT8) |
Weights | bert_embeddings_layernorm_beta | 1, 2 | Beta parameter for layer norm. Shape: [E,] where E is hidden size |
Weights | bert_embeddings_layernorm_gamma | 1, 2 | Gamma parameter for layer norm. Shape: [E,] where E is hidden size |
Weights | bert_embeddings_word_embeddings | 1, 2 | Token embedding matrix. Shape: [word_vocab_size, E] where E is hidden size |
Weights | bert_embeddings_token_type_embeddings | 1, 2 | Token type embedding matrix. Shape: [type_vocab_size, E] where E is hidden size |
Weights | bert_embeddings_position_embeddings | 1, 2 | Positional embedding matrix. Shape: [S, E] where S is the maximum sequence length and E is hidden size |
The following resources provide a deeper understanding of the embLayerNormPlugin
plugin:
Networks:
For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.
October 2020
Add V2 plugin that supports variable sequence length.
November 2019
This is the first release of this README.md
file.
This plugin only supports GPUs with compute capability >= 7.0. For more information see the CUDA GPU Compute Capability Support Matrix