TensorRT  7.2.1.6
NVIDIA TensorRT
Looking for a C++ dev who knows TensorRT?
I'm looking for work. Hire me!
All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Friends Pages
helpers.tokenization.WordpieceTokenizer Class Reference
Inheritance diagram for helpers.tokenization.WordpieceTokenizer:
Collaboration diagram for helpers.tokenization.WordpieceTokenizer:

Public Member Functions

def __init__ (self, vocab, unk_token="[UNK]", max_input_chars_per_word=200)
 
def tokenize (self, text)
 

Public Attributes

 vocab
 
 unk_token
 
 max_input_chars_per_word
 

Detailed Description

Runs WordPiece tokenziation.

Constructor & Destructor Documentation

◆ __init__()

def helpers.tokenization.WordpieceTokenizer.__init__ (   self,
  vocab,
  unk_token = "[UNK]",
  max_input_chars_per_word = 200 
)

Member Function Documentation

◆ tokenize()

def helpers.tokenization.WordpieceTokenizer.tokenize (   self,
  text 
)
Tokenizes a piece of text into its word pieces.

This uses a greedy longest-match-first algorithm to perform tokenization
using the given vocabulary.

For example:
  input = "unaffable"
  output = ["un", "##aff", "##able"]

Args:
  text: A single token or whitespace separated tokens. This should have
already been passed through `BasicTokenizer.

Returns:
  A list of wordpiece tokens.
Here is the call graph for this function:

Member Data Documentation

◆ vocab

helpers.tokenization.WordpieceTokenizer.vocab

◆ unk_token

helpers.tokenization.WordpieceTokenizer.unk_token

◆ max_input_chars_per_word

helpers.tokenization.WordpieceTokenizer.max_input_chars_per_word

The documentation for this class was generated from the following file: