Inheritance diagram for helpers.tokenization.WordpieceTokenizer:

Collaboration diagram for helpers.tokenization.WordpieceTokenizer:

Public Member Functions
def	__init__ (self, vocab, unk_token="[UNK]", max_input_chars_per_word=200)

def	tokenize (self, text)

Public Attributes
	vocab

	unk_token

	max_input_chars_per_word

Detailed Description

Runs WordPiece tokenziation.

Constructor & Destructor Documentation

◆ init()

def helpers.tokenization.WordpieceTokenizer.__init__	(	self,
		vocab,
		unk_token = `"[UNK]"`,
		max_input_chars_per_word = `200`
	)

Member Function Documentation

◆ tokenize()

def helpers.tokenization.WordpieceTokenizer.tokenize	(	self,
		text
	)

Tokenizes a piece of text into its word pieces.

This uses a greedy longest-match-first algorithm to perform tokenization
using the given vocabulary.

For example:
  input = "unaffable"
  output = ["un", "##aff", "##able"]

Args:
  text: A single token or whitespace separated tokens. This should have
already been passed through `BasicTokenizer.

Returns:
  A list of wordpiece tokens.

Here is the call graph for this function:

Member Data Documentation

◆ vocab

helpers.tokenization.WordpieceTokenizer.vocab

◆ unk_token

helpers.tokenization.WordpieceTokenizer.unk_token

◆ max_input_chars_per_word

helpers.tokenization.WordpieceTokenizer.max_input_chars_per_word

The documentation for this class was generated from the following file:

tokenization.py

Public Member Functions

Public Attributes