TensorRT  7.2.1.6
NVIDIA TensorRT
Looking for a C++ dev who knows TensorRT?
I'm looking for work. Hire me!
All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Friends Pages
helpers.tokenization.BasicTokenizer Class Reference
Inheritance diagram for helpers.tokenization.BasicTokenizer:
Collaboration diagram for helpers.tokenization.BasicTokenizer:

Public Member Functions

def __init__ (self, do_lower_case=True)
 
def tokenize (self, text)
 

Public Attributes

 do_lower_case
 

Private Member Functions

def _run_strip_accents (self, text)
 
def _run_split_on_punc (self, text)
 
def _tokenize_chinese_chars (self, text)
 
def _is_chinese_char (self, cp)
 
def _clean_text (self, text)
 

Detailed Description

Runs basic tokenization (punctuation splitting, lower casing, etc.).

Constructor & Destructor Documentation

◆ __init__()

def helpers.tokenization.BasicTokenizer.__init__ (   self,
  do_lower_case = True 
)
Constructs a BasicTokenizer.

Args:
  do_lower_case: Whether to lower case the input.

Member Function Documentation

◆ tokenize()

def helpers.tokenization.BasicTokenizer.tokenize (   self,
  text 
)
Tokenizes a piece of text.
Here is the call graph for this function:

◆ _run_strip_accents()

def helpers.tokenization.BasicTokenizer._run_strip_accents (   self,
  text 
)
private
Strips accents from a piece of text.
Here is the caller graph for this function:

◆ _run_split_on_punc()

def helpers.tokenization.BasicTokenizer._run_split_on_punc (   self,
  text 
)
private
Splits punctuation on a piece of text.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ _tokenize_chinese_chars()

def helpers.tokenization.BasicTokenizer._tokenize_chinese_chars (   self,
  text 
)
private
Adds whitespace around any CJK character.
Here is the call graph for this function:
Here is the caller graph for this function:

◆ _is_chinese_char()

def helpers.tokenization.BasicTokenizer._is_chinese_char (   self,
  cp 
)
private
Checks whether CP is the codepoint of a CJK character.
Here is the caller graph for this function:

◆ _clean_text()

def helpers.tokenization.BasicTokenizer._clean_text (   self,
  text 
)
private
Performs invalid character removal and whitespace cleanup on text.
Here is the call graph for this function:
Here is the caller graph for this function:

Member Data Documentation

◆ do_lower_case

helpers.tokenization.BasicTokenizer.do_lower_case

The documentation for this class was generated from the following file: