Inheritance diagram for helpers.tokenization.BasicTokenizer:

Collaboration diagram for helpers.tokenization.BasicTokenizer:

Public Member Functions
def	__init__ (self, do_lower_case=True)

def	tokenize (self, text)

Public Attributes
	do_lower_case

Private Member Functions
def	_run_strip_accents (self, text)

def	_run_split_on_punc (self, text)

def	_tokenize_chinese_chars (self, text)

def	_is_chinese_char (self, cp)

def	_clean_text (self, text)

Detailed Description

Runs basic tokenization (punctuation splitting, lower casing, etc.).

Constructor & Destructor Documentation

◆ init()

def helpers.tokenization.BasicTokenizer.__init__	(	self,
		do_lower_case = `True`
	)

Constructs a BasicTokenizer.

Args:
  do_lower_case: Whether to lower case the input.

Member Function Documentation

◆ tokenize()

def helpers.tokenization.BasicTokenizer.tokenize	(	self,
		text
	)

Tokenizes a piece of text.

Here is the call graph for this function:

◆ _run_strip_accents()

def helpers.tokenization.BasicTokenizer._run_strip_accents	(	self,
		text
	)

private

Strips accents from a piece of text.

Here is the caller graph for this function:

◆ _run_split_on_punc()

def helpers.tokenization.BasicTokenizer._run_split_on_punc	(	self,
		text
	)

private

Splits punctuation on a piece of text.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ _tokenize_chinese_chars()

def helpers.tokenization.BasicTokenizer._tokenize_chinese_chars	(	self,
		text
	)

private

Adds whitespace around any CJK character.

Here is the call graph for this function:

Here is the caller graph for this function:

◆ _is_chinese_char()

def helpers.tokenization.BasicTokenizer._is_chinese_char	(	self,
		cp
	)

private

Checks whether CP is the codepoint of a CJK character.

Here is the caller graph for this function:

◆ _clean_text()

def helpers.tokenization.BasicTokenizer._clean_text	(	self,
		text
	)

private

Performs invalid character removal and whitespace cleanup on text.

Here is the call graph for this function:

Here is the caller graph for this function:

Member Data Documentation

◆ do_lower_case

helpers.tokenization.BasicTokenizer.do_lower_case

The documentation for this class was generated from the following file:

tokenization.py

Public Member Functions

Public Attributes

Private Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ __init__()

Member Function Documentation

◆ tokenize()

◆ _run_strip_accents()

◆ _run_split_on_punc()

◆ _tokenize_chinese_chars()

◆ _is_chinese_char()

◆ _clean_text()

Member Data Documentation

◆ do_lower_case

◆ init()