A universal tensor quantization function
Take an input tensor, output an quantized tensor. The granularity of scale can be interpreted from the
shape of amax.
output_dtype indicates whether the quantized value will be stored in integer or float. The reason we want to store
it in float is the pytorch function takes the quantized value may not accept integer input, e.g. Conv2D.
It uses 2^num_bits -1 values instead of 2^num_bits. e.g., for num_bits=8, it uses [-127, 127] instead of [-128, 127]
def pytorch_quantization.tensor_quant.TensorQuantFunction.forward |
( |
|
ctx, |
|
|
|
inputs, |
|
|
|
amax, |
|
|
|
num_bits = 8 , |
|
|
|
unsigned = False , |
|
|
|
narrow_range = True |
|
) |
| |
|
static |
Follow tensorflow convention, max value is passed in and used to decide scale, instead of inputing scale
directly. Though inputing scale directly may be more natural to use.
Args:
ctx: A Context object to store tensors for backward.
inputs: A Tensor of type float32.
amax: A Tensor of type float32. Inputs will be quantized within range [-amax, amax]
amax will be broadcasted to inputs tensor.
num_bits: A integer used to calculate scaling factor, scale = (2^(num_bits-1) - 1) / max
Effectively, it indicates how many integer bits is used to represent the value. Default 8.
output_dtype: A type of Tensor. torch.int32 or torch.float32.
unsigned: A boolean. Use unsigned integer range. E.g. [0, 255] for num_bits=8. Default False.
narrow_range: A boolean. Use symmetric integer range for signed quantization
E.g. [-127,127] instead of [-128,127] for num_bits=8. Default True.
Returns:
outputs: A Tensor of type output_dtype.
scale: A Tensor of type float32. outputs / scale will dequantize outputs tensor.
Raises:
ValueError: