Quantization - Revision history

BPeat at 02:30, 3 March 2019

2019-03-03T02:30:43Z

BPeat at 02:29, 3 March 2019

2019-03-03T02:29:54Z

BPeat at 02:29, 3 March 2019

2019-03-03T02:29:08Z

BPeat at 02:27, 3 March 2019

2019-03-03T02:27:22Z

BPeat at 02:23, 3 March 2019

2019-03-03T02:23:11Z

BPeat at 02:20, 3 March 2019

2019-03-03T02:20:00Z

BPeat at 01:58, 3 March 2019

2019-03-03T01:58:34Z

BPeat at 01:56, 3 March 2019

2019-03-03T01:56:30Z

BPeat at 01:53, 3 March 2019

2019-03-03T01:53:54Z

BPeat at 01:42, 3 March 2019

2019-03-03T01:42:59Z

@@ Line 10: / Line 10: @@
 * [http://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/ How to Quantize Neural Networks with TensorFlow | Pete Warden]
 * [http://heartbeat.fritz.ai/8-bit-quantization-and-tensorflow-lite-speeding-up-mobile-inference-with-low-precision-a882dfcafbbd 8-Bit Quantization and TensorFlow Lite: Speeding up mobile inference with low precision | Manas Sahni]
-* [http://songhan.github.io/SqueezeNet-Deep-Compression/ SqueezeNet: AlexNet-level accuracy with 50x fewer parameters | GitHub]
 the process of constraining an input from a continuous or otherwise large set of values (such as the real numbers) to a discrete set (such as the integers). An umbrella term that covers a lot of different techniques to store numbers and perform calculations on them in more compact formats than 32-bit floating point.
@@ Line 43: / Line 43: @@
 * Combining with ‘Deep Compression’ even more compression ratio could be achieved by using mixed precision. (new results release in ICLR’16: GoogLeNet could be compressed by 10x; SqueezeNet could be compressed by 10x.)  [http://arxiv.org/pdf/1510.00149.pdf DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING | Song Han, Huizi Mao, & William J. Dally]

@@ Line 7: / Line 7: @@
 [http://www.youtube.com/results?search_query=Quantization+aware+model+training YouTube search...]
 [http://www.google.com/search?q=Quantization+aware+model+training ...Google search]
 * [http://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/ How to Quantize Neural Networks with TensorFlow | Pete Warden]
@@ Line 41: / Line 40: @@
 http://petewarden.files.wordpress.com/2016/05/quantization2.png
-Pruning can remove lots of weights before doing quantization without hurting accuracy. Pruning can remove 67% for CONV layers, 90% for FC layers, verified across LeNet, AlexNet, VGGNet (shown in below paper), GoogLeNet, SqueezeNet, NeuralTalk (done recently after the paper)  [http://arxiv.org/pdf/1506.02626v3.pdf Learning both Weights and Connections for Efficient Neural Networks]
+* Pruning can remove lots of weights before doing quantization without hurting accuracy. Pruning can remove 67% for CONV layers, 90% for FC layers, verified across LeNet, AlexNet, VGGNet (shown in below paper), GoogLeNet, SqueezeNet, NeuralTalk (done recently after the paper)  [http://arxiv.org/pdf/1506.02626v3.pdf Learning both Weights and Connections for Efficient Neural Networks]
-Combining with ‘Deep Compression’ even more compression ratio could be achieved by using mixed precision. (new results release in ICLR’16: GoogLeNet could be compressed by 10x; SqueezeNet could be compressed by 10x.)  [http://arxiv.org/pdf/1510.00149.pdf DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING | Song Han, Huizi Mao, & William J. Dally]
+* Combining with ‘Deep Compression’ even more compression ratio could be achieved by using mixed precision. (new results release in ICLR’16: GoogLeNet could be compressed by 10x; SqueezeNet could be compressed by 10x.)  [http://arxiv.org/pdf/1510.00149.pdf DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING | Song Han, Huizi Mao, & William J. Dally]

@@ Line 41: / Line 41: @@
 http://petewarden.files.wordpress.com/2016/05/quantization2.png
-Pruning can remove lots of weights before doing quantization without hurting accuracy. Pruning can remove 67% for CONV layers, 90% for FC layers, verified across LeNet, AlexNet, VGGNet (shown in below paper), GoogLeNet, SqueezeNet, NeuralTalk (done recently after the paper)
+Pruning can remove lots of weights before doing quantization without hurting accuracy. Pruning can remove 67% for CONV layers, 90% for FC layers, verified across LeNet, AlexNet, VGGNet (shown in below paper), GoogLeNet, SqueezeNet, NeuralTalk (done recently after the paper)  [http://arxiv.org/pdf/1506.02626v3.pdf Learning both Weights and Connections for Efficient Neural Networks]
-Combining with ‘Deep Compression’ even more compression ratio could be achieved by using mixed precision. (new results release in ICLR’16: GoogLeNet could be compressed by 10x; SqueezeNet could be compressed by 10x.)
+Combining with ‘Deep Compression’ even more compression ratio could be achieved by using mixed precision. (new results release in ICLR’16: GoogLeNet could be compressed by 10x; SqueezeNet could be compressed by 10x.)  [http://arxiv.org/pdf/1510.00149.pdf DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING | Song Han, Huizi Mao, & William J. Dally]

@@ Line 41: / Line 41: @@
 http://petewarden.files.wordpress.com/2016/05/quantization2.png
-Pruning can remove lots of weights before doing quantization without hurting accuracy. Pruning can remove 67% for CONV layers, 90% for FC layers, verified across LeNet, AlexNet, VGGNet (shown in below paper), GoogLeNet, SqueezeNet, NeuralTalk (done recently after the paper)[http://arxiv.org/pdf/1506.02626v3.pdf Learning both Weights and Connections for Efficient
+Pruning can remove lots of weights before doing quantization without hurting accuracy. Pruning can remove 67% for CONV layers, 90% for FC layers, verified across LeNet, AlexNet, VGGNet (shown in below paper), GoogLeNet, SqueezeNet, NeuralTalk (done recently after the paper)
 Neural Networks]
-Combining with ‘Deep Compression’ even more compression ratio could be achieved by using mixed precision: [http://arxiv.org/pdf/1510.00149.pdf DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING | Song Han, Huizi Mao, & William J. Dally]
+Combining with ‘Deep Compression’ even more compression ratio could be achieved by using mixed precision. (new results release in ICLR’16: GoogLeNet could be compressed by 10x; SqueezeNet could be compressed by 10x.)
- (new results release in ICLR’16: GoogLeNet could be compressed by 10x; SqueezeNet could be compressed by 10x.)
+* [http://arxiv.org/pdf/1510.00149.pdf DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING | Song Han, Huizi Mao, & William J. Dally]

@@ Line 41: / Line 41: @@
 http://petewarden.files.wordpress.com/2016/05/quantization2.png
-Pruning can remove lots of weights before doing quantization without hurting accuracy. Pruning can remove 67% for CONV layers, 90% for FC layers, verified across LeNet, AlexNet, VGGNet (shown in below paper), GoogLeNet, SqueezeNet, NeuralTalk (done recently after the paper) : [http://arxiv.org/pdf/1506.02626v3.pdf Learning both Weights and Connections for Efficient
+Pruning can remove lots of weights before doing quantization without hurting accuracy. Pruning can remove 67% for CONV layers, 90% for FC layers, verified across LeNet, AlexNet, VGGNet (shown in below paper), GoogLeNet, SqueezeNet, NeuralTalk (done recently after the paper)[http://arxiv.org/pdf/1506.02626v3.pdf Learning both Weights and Connections for Efficient
 Neural Networks]
-Combining with ‘Deep Compression’ even more compression ratio could be achieved by using mixed precision:
+Combining with ‘Deep Compression’ even more compression ratio could be achieved by using mixed precision: [http://arxiv.org/pdf/1510.00149.pdf DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING | Song Han, Huizi Mao, & William J. Dally]
-[http://arxiv.org/pdf/1510.00149.pdf DEEP COMPRESSION: COMPRESSING DEEP NEURAL
+  (new results release in ICLR’16: GoogLeNet could be compressed by 10x; SqueezeNet could be compressed by 10x.)

@@ Line 34: / Line 34: @@
 is a general technique to reduce the model size while also providing up to 3x lower latency with little degradation in model accuracy. Post-training quantization quantizes weights to 8-bits of precision from floating-point.
 <youtube>eZdOkDtYMoo</youtube>
 http://petewarden.files.wordpress.com/2016/05/quantization2.png

@@ Line 8: / Line 8: @@
 [http://www.google.com/search?q=Quantization+aware+model+training ...Google search]
 * [http://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize#quantization-aware-training Quantization-aware training]
-Quantization-aware model training ensures that the forward pass matches precision for both training and inference. There are two aspects to this:
+ensures that the forward pass matches precision for both training and inference. There are two aspects to this:
 * Operator fusion at inference time are accurately modeled at training time.
@@ Line 16: / Line 18: @@
 For efficient inference, [[TensorFlow]] combines batch normalization with the preceding convolutional and fully-connected layers prior to quantization by folding batch norm layers.
 <youtube>eZdOkDtYMoo</youtube>