Difference between revisions of "Quantization"

Revision as of 20:53, 2 March 2019

the process of constraining an input from a continuous or otherwise large set of values (such as the real numbers) to a discrete set (such as the integers). An umbrella term that covers a lot of different techniques to store numbers and perform calculations on them in more compact formats than 32-bit floating point.

0*lKwwM6_WSyBRkPCe.png

Quantization-aware model training

Quantization-aware training

ensures that the forward pass matches precision for both training and inference. There are two aspects to this:

Operator fusion at inference time are accurately modeled at training time.
Quantization effects at inference are modeled at training time.

For efficient inference, TensorFlow combines batch normalization with the preceding convolutional and fully-connected layers prior to quantization by folding batch norm layers.

Post-training quantization

Post-training quantization

is a general technique to reduce the model size while also providing up to 3x lower latency with little degradation in model accuracy. Post-training quantization quantizes weights to 8-bits of precision from floating-point.

@@ Line 7: / Line 7: @@
 [http://www.youtube.com/results?search_query=Quantization+aware+model+training YouTube search...]
 [http://www.google.com/search?q=Quantization+aware+model+training ...Google search]
+* [http://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/ How to Quantize Neural Networks with TensorFlow | Pete Warden]
+* [http://heartbeat.fritz.ai/8-bit-quantization-and-tensorflow-lite-speeding-up-mobile-inference-with-low-precision-a882dfcafbbd 8-Bit Quantization and TensorFlow Lite: Speeding up mobile inference with low precision | Manas Sahni]
+* [[TensorFlow Lite]]
+the process of constraining an input from a continuous or otherwise large set of values (such as the real numbers) to a discrete set (such as the integers). An umbrella term that covers a lot of different techniques to store numbers and perform calculations on them in more compact formats than 32-bit floating point.
+http://cdn-images-1.medium.com/max/800/0*lKwwM6_WSyBRkPCe.png
@@ Line 26: / Line 35: @@
 <youtube>eZdOkDtYMoo</youtube>
+http://petewarden.files.wordpress.com/2016/05/quantization2.png

Difference between revisions of "Quantization"

Revision as of 20:53, 2 March 2019

Quantization-aware model training

Post-training quantization

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools