Difference between revisions of "Neural Network"
m |
m |
||
| Line 50: | Line 50: | ||
<youtube>Zb9-1PZoAj4</youtube> | <youtube>Zb9-1PZoAj4</youtube> | ||
| − | <img src="https://i0.wp.com/syncedreview.com/wp-content/uploads/2019/06/image-45.png" width=" | + | <img src="https://i0.wp.com/syncedreview.com/wp-content/uploads/2019/06/image-45.png" width="600"> |
== History == | == History == | ||
| Line 75: | Line 75: | ||
| − | <img src="https://images.theconversation.com/files/516916/original/file-20230322-28-33idd6.png" width=" | + | <img src="https://images.theconversation.com/files/516916/original/file-20230322-28-33idd6.png" width="400"> |
Revision as of 22:11, 31 March 2023
YouTube search... ...Google search
- Deep Learning
- AI Solver
- Capabilities
- Deep Learning’s Uncertainty Principle
- Andrew Ng's Deep Learning
- Automated Learning
- Deep Learning’s Uncertainty Principle | Carlos E. Perez - Intuition Machine ... Deep Learning Patterns, Methodology and Strategy
Researchers' study | The MIT research trio of Tomaso Poggio, Andrzej Banburski, and Quianli Liao - Center for Brains, Minds, and Machines) compared deep and shallow networks in which both used identical sets of procedures such as pooling, convolution, linear combinations, a fixed nonlinear function of one variable, and dot products. Why do deep networks have great approximation powers, and tend to achieve better results than shallow networks given they are both universal approximators?
The scientists observed that with convolutional deep neural networks with hierarchical locality, this exponential cost vanishes and becomes more linear again. Then they demonstrated that dimensionality can be avoided for deep networks of the convolutional type for certain types of compositional functions. The implications are that for problems with hierarchical locality, such as image classification, deep networks are exponentially more powerful than shallow networks. ...
The scientists observed that with convolutional deep neural networks with hierarchical locality, this exponential cost vanishes and becomes more linear again. Then they demonstrated that dimensionality can be avoided for deep networks of the convolutional type for certain types of compositional functions. The implications are that for problems with hierarchical locality, such as image classification, deep networks are exponentially more powerful than shallow networks.
“In approximation theory, both shallow and deep networks are known to approximate any continuous functions at an exponential cost,” the researchers wrote. “However, we proved that for certain types of compositional functions, deep networks of the convolutional type (even without weight sharing) can avoid the curse of dimensionality.”
The team then set out to explain why deep networks, which tend to be over-parameterized, perform well on out-of-sample data. The researchers demonstrated that for classification problems, given a standard deep network, trained with gradient descent algorithms, it is the direction in the parameter space that matters, rather than the norms or the size of the weights.
The implications are that the dynamics of gradient descent on deep networks are equivalent to those with explicit constraints on both the norm and size of the parameters–the gradient descent converges to the max-margin solution. The team discovered a similarity known to linear models in which vector machines converge to the pseudoinverse solution which aims to minimize the number of solutions.
In effect, the team posit that the act of training deep networks serves to provide implicit regularization and norm control. The scientists attribute the ability for deep networks to generalize, sans explicit capacity controls of a regularization term or constraint on the norm of the weights, to the mathematical computation that shows the unit vector (computed from the solution of gradient descent) remains the same, whether or not the constraint is enforced during gradient descent. In other words, deep networks select minimum norm solutions, hence the gradient flow of deep networks with an exponential-type loss locally minimizes the expected error. A New AI Study May Explain Why Deep Learning Works MIT researchers’ new theory illuminates machine learning’s black box.| Cami Rosso - Psychology Today ..PNAS (Proceedings of the National Academy of Sciences of the United States of America) | T. Poggio, A. Banburski, and Q. Liao
Neural Networks - Hugo Larochelle
Opening the Black Box
- Opening the Black Box of Deep Neural Networks via Information | Ravid Schwartz-Ziv and Naftali Tishby - The Hebrew University of Jerusalem
- New Theory Cracks Open the Black Box of Deep Learning | Natalie Wolchover - QuantaMagazine
History
The neural net scientist James Anderson and the science journalist Edward Rosenfeld have noted that the background to neural networks goes back into the 1940s and some early attempts to, as they describe, “understand the human nervous systems and to build artificial systems that act the way we do, at least a little bit”. And so, in the 1940s, the mysteries of the human nervous system also became the mysteries of computational thinking and artificial intelligence.
Summarising this long story, the computer science writer Larry Hardesty has pointed out that deep learning in the form of neural networks “have been going in and out of fashion for more than 70 years”. More specifically, he adds, these “neural networks were first proposed in 1944 by Warren McCulloch and Walter Pitts, two University of Chicago researchers who moved to MIT in 1952 as founding members of what’s sometimes called the first cognitive science department”.
Black and white image of two men The inventors of the neural network Walter Pitts and Warren McCulloch pictured here in 1949. Semantic Scholar Elsewhere, 1943 is sometimes the given date as the first year for the technology. Either way, for roughly 70 years accounts suggest that neural networks have moved in and out of vogue, often neglected but then sometimes taking hold and moving into more mainstream applications and debates. The uncertainty persisted. Those early developers frequently describe the importance of their research as being overlooked, until it found its purpose often years and sometimes decades later.
Moving from the 1960s into the late 1970s we can find further stories of the unknown properties of these systems. Even then, after three decades, the neural network was still to find a sense of purpose. David Rumelhart, who had a background in psychology and was a co-author of a set of books published in 1986 that would later drive attention back again towards neural networks, found himself collaborating on the development of neural networks with his colleague Jay McClelland.
As well as being colleagues they had also recently encountered each other at a conference in Minnesota where Rumelhart’s talk on “story understanding” had provoked some discussion among the delegates.
Following that conference McClelland returned with a thought about how to develop a neural network that might combine models to be more interactive. What matters here is Rumelhart’s recollection of the “hours and hours and hours of tinkering on the computer”.
We sat down and did all this in the computer and built these computer models, and we just didn’t understand them. We didn’t understand why they worked or why they didn’t work or what was critical about them.
Like Taylor, Rumelhart found himself tinkering with the system. They too created a functioning neural network and, crucially, they also weren’t sure how or why it worked in the way that it did, seemingly learning from data and finding associations.