Difference between revisions of "(Deep) Convolutional Neural Network (DCNN/CNN)"

From
Jump to: navigation, search
(Compound Scaling)
m
(10 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
|title=PRIMO.ai
 
|title=PRIMO.ai
 
|titlemode=append
 
|titlemode=append
|keywords=artificial, intelligence, machine, learning, models, algorithms, data, singularity, moonshot, Tensorflow, Google, Nvidia, Microsoft, Azure, Amazon, AWS  
+
|keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools  
+
 
 +
<!-- Google tag (gtag.js) -->
 +
<script async src="https://www.googletagmanager.com/gtag/js?id=G-4GCWLBVJ7T"></script>
 +
<script>
 +
  window.dataLayer = window.dataLayer || [];
 +
  function gtag(){dataLayer.push(arguments);}
 +
  gtag('js', new Date());
 +
 
 +
  gtag('config', 'G-4GCWLBVJ7T');
 +
</script>
 
}}
 
}}
[http://www.youtube.com/results?search_query=Convolutional+Neural+Network+CNN+deep+learning YouTube search...]
+
[https://www.youtube.com/results?search_query=Convolutional+CNN+ai YouTube]
[http://www.google.com/search?q=Convolutional+Neural+Network+CNN+deep+learning ...Google search]
+
[https://www.quora.com/search?q=Convolutional%20%CNN20AI ... Quora]
 +
[https://www.google.com/search?q=Convolutional+CNN+ai ...Google search]
 +
[https://news.google.com/search?q=Convolutional+CNN+ai ...Google News]
 +
[https://www.bing.com/news/search?q=Convolutional+CNN+ai&qft=interval%3d%228%22 ...Bing News]
  
* [[Deep Learning]]
+
* [[State Space Model (SSM)]] ... [[Mamba]] ... [[Sequence to Sequence (Seq2Seq)]] ... [[Recurrent Neural Network (RNN)]] ... [[(Deep) Convolutional Neural Network (DCNN/CNN)|Convolutional Neural Network (CNN)]]
 +
* [[Large Language Model (LLM)]] ... [[Large Language Model (LLM)#Multimodal|Multimodal]] ... [[Foundation Models (FM)]] ... [[Generative Pre-trained Transformer (GPT)|Generative Pre-trained]] ... [[Transformer]] ... [[GPT-4]] ... [[GPT-5]] ... [[Attention]] ... [[Generative Adversarial Network (GAN)|GAN]] ... [[Bidirectional Encoder Representations from Transformers (BERT)|BERT]]
 +
* [[Natural Language Processing (NLP)]] ... [[Natural Language Generation (NLG)|Generation (NLG)]] ... [[Natural Language Classification (NLC)|Classification (NLC)]] ... [[Natural Language Processing (NLP)#Natural Language Understanding (NLU)|Understanding (NLU)]] ... [[Language Translation|Translation]] ... [[Summarization]] ... [[Sentiment Analysis|Sentiment]] ... [[Natural Language Tools & Services|Tools]]
 +
* [[What is Artificial Intelligence (AI)? | Artificial Intelligence (AI)]] ... [[Generative AI]] ... [[Machine Learning (ML)]] ... [[Deep Learning]] ... [[Neural Network]] ... [[Reinforcement Learning (RL)|Reinforcement]] ... [[Learning Techniques]]
 
* [[Representation Learning]]
 
* [[Representation Learning]]
* [http://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/ An Intuitive Explanation of Convolutional Neural Networks | ujjwalkarn]
+
* [[Video/Image]] ... [[Vision]] ... [[Enhancement]] ... [[Fake]] ... [[Reconstruction]] ... [[Colorize]] ... [[Occlusions]] ... [[Predict image]] ... [[Image/Video Transfer Learning]]
* [[Image Retrieval / Object Detection]]  
 
 
* [[Style Transfer]]
 
* [[Style Transfer]]
* [http://www.asimovinstitute.org/author/fjodorvanveen/ Neural Network Zoo | Fjodor Van Veen]
+
* [https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/ An Intuitive Explanation of Convolutional Neural Networks | ujjwalkarn]
* [http://deeplearning4j.org/convolutionalnetwork.html Guide]
+
* [https://www.asimovinstitute.org/author/fjodorvanveen/ Neural Network Zoo | Fjodor Van Veen]
* [http://www.youtube.com/watch?v=Z91YCMvxdo0&list=PLBAGcD3siRDjBU8sKRk0zX9pMz9qeVxud CNN1. What is Computer Vision? | deeplearning.ai]
+
* [https://deeplearning4j.org/convolutionalnetwork.html Guide]
* [http://www.youtube.com/watch?v=vT1JzLTH4G4&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv Convolutional Neural Networks for Visual Recognition | Stanford]
+
* [https://www.youtube.com/watch?v=Z91YCMvxdo0&list=PLBAGcD3siRDjBU8sKRk0zX9pMz9qeVxud CNN1. What is Computer Vision? | deeplearning.ai]
* [http://hackernoon.com/learning-ai-if-you-suck-at-math-p5-deep-learning-and-convolutional-neural-nets-in-plain-english-cda79679bbe3 Learning AI if You Suck at Math — P5 — Deep Learning and Convolutional Neural Nets in Plain English! | Daniel Jeffries]
+
* [https://www.youtube.com/watch?v=vT1JzLTH4G4&list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv Convolutional Neural Networks for Visual Recognition | Stanford]
 +
* [https://hackernoon.com/learning-ai-if-you-suck-at-math-p5-deep-learning-and-convolutional-neural-nets-in-plain-english-cda79679bbe3 Learning AI if You Suck at Math — P5 — Deep Learning and Convolutional Neural Nets in Plain English! | Daniel Jeffries]
 
* [[Graph Convolutional Network (GCN), Graph Neural Networks (Graph Nets), Geometric Deep Learning]]
 
* [[Graph Convolutional Network (GCN), Graph Neural Networks (Graph Nets), Geometric Deep Learning]]
* [http://www.quantamagazine.org/foundations-built-for-a-general-theory-of-neural-networks-20190131/ Foundations Built for a General Theory of Neural Networks | Kevin Hartnett]
+
* [https://www.quantamagazine.org/foundations-built-for-a-general-theory-of-neural-networks-20190131/ Foundations Built for a General Theory of Neural Networks | Kevin Hartnett]
* [http://elifesciences.org/articles/38173 CaImAn an open source tool for scalable calcium imaging data analysis | A. Giovannucci, J. Friedrich, P. Gunn, J. Kalfon, B. Brown, S. Koay, J. Taxidis, F. Najafi, J. Gauthier, P. Zhou, B. Khakh, D. Tank, D. Chklovskii, and E. Pnevmatikakis-eLIFE]
+
* [https://elifesciences.org/articles/38173 CaImAn an open source tool for scalable calcium imaging data analysis | A. Giovannucci, J. Friedrich, P. Gunn, J. Kalfon, B. Brown, S. Koay, J. Taxidis, F. Najafi, J. Gauthier, P. Zhou, B. Khakh, D. Tank, D. Chklovskii, and E. Pnevmatikakis-eLIFE]
* [http://www.kdnuggets.com/2019/07/index.html Convolutional Neural Networks: A Python Tutorial Using TensorFlow and Keras | Luciano Strika -  MercadoLibre - KDnuggests]
+
* [https://www.kdnuggets.com/2019/07/index.html Convolutional Neural Networks: A Python Tutorial Using TensorFlow and Keras | Luciano Strika -  MercadoLibre - KDnuggests]
* [http://pathmind.com/wiki/convolutional-network A Beginner's Guide to Convolutional Neural Networks (CNNs) | Chris Nicholson - A.I. Wiki pathmind]
+
* [https://pathmind.com/wiki/convolutional-network A Beginner's Guide to Convolutional Neural Networks (CNNs) | Chris Nicholson - A.I. Wiki pathmind]
* [http://yann.lecun.com/exdb/lenet/index.html LeNet-5, convolutional neural networks |] [http://yann.lecun.com/index.html Yann LeCun]
+
* [https://yann.lecun.com/exdb/lenet/index.html LeNet-5, convolutional neural networks |] [https://yann.lecun.com/index.html Yann LeCun]
 +
* [https://dzone.com/articles/basic-convolutional-neural-network-architectures?edition=596293  Basic Convolutional Neural Network Architectures | Anomi Ragendran - DZone]
  
Convolution - [http://mathworld.wolfram.com/Convolution.html is an integral that expresses the amount of overlap of one function g as it is shifted over another function f. It therefore "blends" one function with another.{Wolfram}] [http://en.wikipedia.org/wiki/Convolution a mathematical operation on two functions (f and g) to produce a third function that expresses how the shape of one is modified by the other. The term convolution refers to both the result function and to the process of computing it. Convolution is similar to cross-correlation. | Wikipedia]
+
Convolution - [https://mathworld.wolfram.com/Convolution.html is an integral that expresses the amount of overlap of one function g as it is shifted over another function f. It therefore "blends" one function with another.{Wolfram}] [https://en.wikipedia.org/wiki/Convolution a mathematical operation on two functions (f and g) to produce a third function that expresses how the shape of one is modified by the other. The term convolution refers to both the result function and to the process of computing it. Convolution is similar to cross-correlation. | Wikipedia]
  
http://upload.wikimedia.org/wikipedia/commons/6/6a/Convolution_of_box_signal_with_itself2.gif
+
https://upload.wikimedia.org/wikipedia/commons/6/6a/Convolution_of_box_signal_with_itself2.gif
  
http://upload.wikimedia.org/wikipedia/commons/b/b9/Convolution_of_spiky_function_with_box2.gif
+
https://upload.wikimedia.org/wikipedia/commons/b/b9/Convolution_of_spiky_function_with_box2.gif
  
Convolutional Neural Networks (ConvNets or CNNs) classify images (e.g. name what they see), cluster them by similarity (photo search), and perform object recognition within scenes. They are algorithms that can identify faces, individuals, street signs, tumors, platypuses and many other aspects of visual data. Convolutional networks perform optical character recognition (OCR) to digitize text and make natural-language processing possible on analog and hand-written documents, where the images are symbols to be transcribed. CNNs can also be applied to sound when it is represented visually as a spectrogram. More recently, convolutional networks have been applied directly to text analytics as well as graph data with graph convolutional networks. They are primarily used for image processing but can also be used for other types of input such as as audio. A typical use case for CNNs is where you feed the network images and the network classifies the data, e.g. it outputs “cat” if you give it a cat picture and “dog” when you give it a dog picture. CNNs tend to start with an input “scanner” which is not intended to parse all the training data at once. For example, to input an image of 200 x 200 pixels, you wouldn’t want a layer with 40 000 nodes. Rather, you create a scanning input layer of say 20 x 20 which you feed the first 20 x 20 pixels of the image (usually starting in the upper left corner). Once you passed that input (and possibly use it for training) you feed it the next 20 x 20 pixels: you move the scanner one pixel to the right. Note that one wouldn’t move the input 20 pixels (or whatever scanner width) over, you’re not dissecting the image into blocks of 20 x 20, but rather you’re crawling over it. This input data is then fed through convolutional layers instead of normal layers, where not all nodes are connected to all nodes. Each node only concerns itself with close neighbouring cells (how close depends on the implementation, but usually not more than a few). These convolutional layers also tend to shrink as they become deeper, mostly by easily divisible factors of the input (so 20 would probably go to a layer of 10 followed by a layer of 5). Powers of two are very commonly used here, as they can be divided cleanly and completely by definition: 32, 16, 8, 4, 2, 1. Besides these convolutional layers, they also often feature pooling layers ([[Pooling / Sub-sampling: Max, Mean]]). Pooling is a way to filter out details: a commonly found pooling technique is max pooling, where we take say 2 x 2 pixels and pass on the pixel with the most amount of red. To apply CNNs for audio, you basically feed the input audio waves and inch over the length of the clip, segment by segment. Real world implementations of CNNs often glue an FFNN to the end to further process the data, which allows for highly non-linear abstractions. These networks are called DCNNs but the names and abbreviations between these two are often used interchangeably. LeCun, Yann, et al. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324.
+
Convolutional Neural Networks (ConvNets or CNNs) classify images (e.g. name what they see), cluster them by similarity (photo search), and perform object recognition within scenes. They are algorithms that can identify faces, individuals, street signs, tumors, platypuses and many other aspects of visual data. Convolutional networks perform optical character recognition (OCR) to digitize text and make natural-language processing possible on analog and hand-written documents, where the images are symbols to be transcribed. CNNs can also be applied to sound when it is represented visually as a spectrogram. More recently, convolutional networks have been applied directly to text [[analytics]] as well as graph data with graph convolutional networks. They are primarily used for image processing but can also be used for other types of input such as as audio. A typical use case for CNNs is where you feed the network images and the network classifies the data, e.g. it outputs “cat” if you give it a cat picture and “dog” when you give it a dog picture. CNNs tend to start with an input “scanner” which is not intended to parse all the training data at once. For example, to input an image of 200 x 200 pixels, you wouldn’t want a layer with 40 000 nodes. Rather, you create a scanning input layer of say 20 x 20 which you feed the first 20 x 20 pixels of the image (usually starting in the upper left corner). Once you passed that input (and possibly use it for training) you feed it the next 20 x 20 pixels: you move the scanner one pixel to the right. Note that one wouldn’t move the input 20 pixels (or whatever scanner width) over, you’re not dissecting the image into blocks of 20 x 20, but rather you’re crawling over it. This input data is then fed through convolutional layers instead of normal layers, where not all nodes are connected to all nodes. Each node only concerns itself with close neighbouring cells (how close depends on the implementation, but usually not more than a few). These convolutional layers also tend to shrink as they become deeper, mostly by easily divisible factors of the input (so 20 would probably go to a layer of 10 followed by a layer of 5). Powers of two are very commonly used here, as they can be divided cleanly and completely by definition: 32, 16, 8, 4, 2, 1. Besides these convolutional layers, they also often feature pooling layers ([[Pooling / Sub-sampling: Max, Mean]]). Pooling is a way to filter out details: a commonly found pooling technique is max pooling, where we take say 2 x 2 pixels and pass on the pixel with the most amount of red. To apply CNNs for audio, you basically feed the input audio waves and inch over the length of the clip, segment by segment. Real world implementations of CNNs often glue an FFNN to the end to further process the data, which allows for highly non-linear abstractions. These networks are called DCNNs but the names and abbreviations between these two are often used interchangeably. LeCun, Yann, et al. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324.
  
http://www.asimovinstitute.org/wp-content/uploads/2016/09/cnn.png
+
https://www.asimovinstitute.org/wp-content/uploads/2016/09/cnn.png
  
  
 
<youtube>JB8T_zN7ZC0</youtube>
 
<youtube>JB8T_zN7ZC0</youtube>
 +
<youtube>Ok44otx90D4</youtube>
 
<youtube>JiN9p5vWHDY</youtube>
 
<youtube>JiN9p5vWHDY</youtube>
 
<youtube>2-Ol7ZB0MmU</youtube>
 
<youtube>2-Ol7ZB0MmU</youtube>
Line 47: Line 64:
 
<youtube>FTr3n7uBIuE</youtube>
 
<youtube>FTr3n7uBIuE</youtube>
  
http://cdn-images-1.medium.com/max/800/1*zLJnMjuGdamCpN4yJMTR5g.png
+
https://cdn-images-1.medium.com/max/800/1*zLJnMjuGdamCpN4yJMTR5g.png
  
 
== CNN Architectures ==
 
== CNN Architectures ==
Line 58: Line 75:
  
 
== Compound Scaling ==
 
== Compound Scaling ==
* [http://arxiv.org/abs/1905.11946v1 Efficientnet: Rethinking Model Scaling For Convolutional Neural Networks | Mingxing Tan And Quoc V. Le]
+
* [https://arxiv.org/abs/1905.11946v1 Efficientnet: Rethinking Model Scaling For Convolutional Neural Networks | Mingxing Tan And Quoc V. Le]
  
 
The researchers from the Google Research Brain Team demonstrated that there is an optimal ratio of depth, width, and resolution in order to maximize efficiency and accuracy. This is called compound scaling. The result is that EfficientNet’s performance surpasses the accuracy of other CNNs on ImageNet by up to 6% while being up to ten times more efficient in terms of speed and size.  
 
The researchers from the Google Research Brain Team demonstrated that there is an optimal ratio of depth, width, and resolution in order to maximize efficiency and accuracy. This is called compound scaling. The result is that EfficientNet’s performance surpasses the accuracy of other CNNs on ImageNet by up to 6% while being up to ten times more efficient in terms of speed and size.  
  
http://www.topbots.com/wp-content/uploads/2019/11/1_EfficientNet_model_800px_web.jpg
+
https://www.topbots.com/wp-content/uploads/2019/11/1_EfficientNet_model_800px_web.jpg
  
 
== DensePose ==
 
== DensePose ==
[http://www.youtube.com/results?search_query=densePose+Convolutional+Neural+Network YouTube search...]
+
[https://www.youtube.com/results?search_query=densePose+Convolutional+Neural+Network YouTube search...]
  
<youtube>EMjPqgLX14A</youtube>
 
 
<youtube>dxOHmvTaCN4</youtube>
 
<youtube>dxOHmvTaCN4</youtube>
 
<youtube>RYrK7UuJBIs</youtube>
 
<youtube>RYrK7UuJBIs</youtube>
 
<youtube>Dhkd_bAwwMc</youtube>
 
<youtube>Dhkd_bAwwMc</youtube>
 +
<youtube>EMjPqgLX14A</youtube>

Revision as of 07:01, 24 December 2023

YouTube ... Quora ...Google search ...Google News ...Bing News

Convolution - is an integral that expresses the amount of overlap of one function g as it is shifted over another function f. It therefore "blends" one function with another.{Wolfram} a mathematical operation on two functions (f and g) to produce a third function that expresses how the shape of one is modified by the other. The term convolution refers to both the result function and to the process of computing it. Convolution is similar to cross-correlation. | Wikipedia

Convolution_of_box_signal_with_itself2.gif

Convolution_of_spiky_function_with_box2.gif

Convolutional Neural Networks (ConvNets or CNNs) classify images (e.g. name what they see), cluster them by similarity (photo search), and perform object recognition within scenes. They are algorithms that can identify faces, individuals, street signs, tumors, platypuses and many other aspects of visual data. Convolutional networks perform optical character recognition (OCR) to digitize text and make natural-language processing possible on analog and hand-written documents, where the images are symbols to be transcribed. CNNs can also be applied to sound when it is represented visually as a spectrogram. More recently, convolutional networks have been applied directly to text analytics as well as graph data with graph convolutional networks. They are primarily used for image processing but can also be used for other types of input such as as audio. A typical use case for CNNs is where you feed the network images and the network classifies the data, e.g. it outputs “cat” if you give it a cat picture and “dog” when you give it a dog picture. CNNs tend to start with an input “scanner” which is not intended to parse all the training data at once. For example, to input an image of 200 x 200 pixels, you wouldn’t want a layer with 40 000 nodes. Rather, you create a scanning input layer of say 20 x 20 which you feed the first 20 x 20 pixels of the image (usually starting in the upper left corner). Once you passed that input (and possibly use it for training) you feed it the next 20 x 20 pixels: you move the scanner one pixel to the right. Note that one wouldn’t move the input 20 pixels (or whatever scanner width) over, you’re not dissecting the image into blocks of 20 x 20, but rather you’re crawling over it. This input data is then fed through convolutional layers instead of normal layers, where not all nodes are connected to all nodes. Each node only concerns itself with close neighbouring cells (how close depends on the implementation, but usually not more than a few). These convolutional layers also tend to shrink as they become deeper, mostly by easily divisible factors of the input (so 20 would probably go to a layer of 10 followed by a layer of 5). Powers of two are very commonly used here, as they can be divided cleanly and completely by definition: 32, 16, 8, 4, 2, 1. Besides these convolutional layers, they also often feature pooling layers (Pooling / Sub-sampling: Max, Mean). Pooling is a way to filter out details: a commonly found pooling technique is max pooling, where we take say 2 x 2 pixels and pass on the pixel with the most amount of red. To apply CNNs for audio, you basically feed the input audio waves and inch over the length of the clip, segment by segment. Real world implementations of CNNs often glue an FFNN to the end to further process the data, which allows for highly non-linear abstractions. These networks are called DCNNs but the names and abbreviations between these two are often used interchangeably. LeCun, Yann, et al. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324.

cnn.png


1*zLJnMjuGdamCpN4yJMTR5g.png

CNN Architectures


Compound Scaling

The researchers from the Google Research Brain Team demonstrated that there is an optimal ratio of depth, width, and resolution in order to maximize efficiency and accuracy. This is called compound scaling. The result is that EfficientNet’s performance surpasses the accuracy of other CNNs on ImageNet by up to 6% while being up to ten times more efficient in terms of speed and size.

1_EfficientNet_model_800px_web.jpg

DensePose

YouTube search...