Classes
class	BasicTokenizer

class	BertTokenizer

class	FullTokenizer

class	WordpieceTokenizer

Functions
def	validate_case_matches_checkpoint (do_lower_case, init_checkpoint)

def	convert_to_unicode (text)

def	printable_text (text)

def	load_vocab (vocab_file)

def	convert_by_vocab (vocab, items)

def	convert_tokens_to_ids (vocab, tokens)

def	convert_ids_to_tokens (inv_vocab, ids)

def	whitespace_tokenize (text)

def	_is_whitespace (char)

def	_is_control (char)

def	_is_punctuation (char)

Function Documentation

◆ validate_case_matches_checkpoint()

def helpers.tokenization.validate_case_matches_checkpoint	(	do_lower_case,
		init_checkpoint
	)

Checks whether the casing config is consistent with the checkpoint name.

◆ convert_to_unicode()

def helpers.tokenization.convert_to_unicode ( text )

Converts `text` to Unicode (if it's not already), assuming utf-8 input.

Here is the caller graph for this function:

◆ printable_text()

def helpers.tokenization.printable_text ( text )

Returns text encoded in a way suitable for print or `tf.logging`.

◆ load_vocab()

def helpers.tokenization.load_vocab ( vocab_file )

Loads a vocabulary file into a dictionary.

Here is the call graph for this function:

◆ convert_by_vocab()

def helpers.tokenization.convert_by_vocab	(	vocab,
		items
	)

Converts a sequence of [tokens|ids] using the vocab.

Here is the caller graph for this function:

◆ convert_tokens_to_ids()

def helpers.tokenization.convert_tokens_to_ids	(	vocab,
		tokens
	)

Here is the call graph for this function:

◆ convert_ids_to_tokens()

def helpers.tokenization.convert_ids_to_tokens	(	inv_vocab,
		ids
	)

Here is the call graph for this function:

◆ whitespace_tokenize()

def helpers.tokenization.whitespace_tokenize ( text )

Runs basic whitespace cleaning and splitting on a piece of text.

Here is the caller graph for this function:

◆ _is_whitespace()

def helpers.tokenization._is_whitespace ( char )

private

Checks whether `chars` is a whitespace character.

Here is the caller graph for this function:

◆ _is_control()

def helpers.tokenization._is_control ( char )

private

Checks whether `chars` is a control character.

Here is the caller graph for this function:

◆ _is_punctuation()

def helpers.tokenization._is_punctuation ( char )

private

Checks whether `chars` is a punctuation character.

Here is the caller graph for this function:

Classes

Functions

Function Documentation

◆ validate_case_matches_checkpoint()

◆ convert_to_unicode()

◆ printable_text()

◆ load_vocab()

◆ convert_by_vocab()

◆ convert_tokens_to_ids()

◆ convert_ids_to_tokens()

◆ whitespace_tokenize()

◆ _is_whitespace()

◆ _is_control()

◆ _is_punctuation()