Public Member Functions | |
def | __init__ (self, vocab, unk_token="[UNK]", max_input_chars_per_word=200) |
def | tokenize (self, text) |
Public Attributes | |
vocab | |
unk_token | |
max_input_chars_per_word | |
Runs WordPiece tokenziation.
def helpers.tokenization.WordpieceTokenizer.__init__ | ( | self, | |
vocab, | |||
unk_token = "[UNK]" , |
|||
max_input_chars_per_word = 200 |
|||
) |
def helpers.tokenization.WordpieceTokenizer.tokenize | ( | self, | |
text | |||
) |
Tokenizes a piece of text into its word pieces. This uses a greedy longest-match-first algorithm to perform tokenization using the given vocabulary. For example: input = "unaffable" output = ["un", "##aff", "##able"] Args: text: A single token or whitespace separated tokens. This should have already been passed through `BasicTokenizer. Returns: A list of wordpiece tokens.
helpers.tokenization.WordpieceTokenizer.vocab |
helpers.tokenization.WordpieceTokenizer.unk_token |
helpers.tokenization.WordpieceTokenizer.max_input_chars_per_word |