Long Short-Term Memory (LSTM) - Revision history

BPeat at 03:20, 2 March 2024

2024-03-02T03:20:14Z

BPeat at 03:19, 2 March 2024

2024-03-02T03:19:33Z

BPeat at 16:55, 6 August 2023

2023-08-06T16:55:25Z

BPeat at 17:36, 3 May 2023

2023-05-03T17:36:38Z

BPeat at 15:18, 19 March 2023

2023-03-19T15:18:23Z

BPeat at 03:44, 26 February 2023

2023-02-26T03:44:07Z

BPeat at 05:39, 12 February 2023

2023-02-12T05:39:10Z

BPeat at 05:33, 12 February 2023

2023-02-12T05:33:17Z

BPeat at 05:10, 12 February 2023

2023-02-12T05:10:10Z

BPeat at 05:09, 12 February 2023

2023-02-12T05:09:55Z

← Older revision		Revision as of 03:20, 2 March 2024
Line 31:		Line 31:
	A LSTM (Long Short-term [[Memory]]) Neural Network is just another kind of Artificial Neural Network, which falls in the category of [[Recurrent Neural Network (RNN)]]. What makes LSTM Neural Networks different from regular Neural Networks is, they have LSTM cells as neurons in some of their layers. Much like Convolutional Layers help a [[(Deep) Convolutional Neural Network (DCNN/CNN)]] learn about image features, LSTM cells help the Network learn about temporal data, something which other Machine Learning models traditionally struggled with. ... Each LSTM cell in our Neural Network will only look at a single column of its inputs, and also at the previous column’s LSTM cell’s output. Normally, we feed our LSTM Neural Network a whole matrix as its input, where each column corresponds to something that “comes before” the next column. This way, each LSTM cell will have two different input vectors: the previous LSTM cell’s output (which gives it some information about the previous input column) and its own input column. [http://www.datastuff.tech/machine-learning/lstm-how-to-train-neural-networks-to-write-like-lovecraft/ LSTM: How To Train Neural Networks To Write Like Lovecraft \| Strikingloo]		A LSTM (Long Short-term [[Memory]]) Neural Network is just another kind of Artificial Neural Network, which falls in the category of [[Recurrent Neural Network (RNN)]]. What makes LSTM Neural Networks different from regular Neural Networks is, they have LSTM cells as neurons in some of their layers. Much like Convolutional Layers help a [[(Deep) Convolutional Neural Network (DCNN/CNN)]] learn about image features, LSTM cells help the Network learn about temporal data, something which other Machine Learning models traditionally struggled with. ... Each LSTM cell in our Neural Network will only look at a single column of its inputs, and also at the previous column’s LSTM cell’s output. Normally, we feed our LSTM Neural Network a whole matrix as its input, where each column corresponds to something that “comes before” the next column. This way, each LSTM cell will have two different input vectors: the previous LSTM cell’s output (which gives it some information about the previous input column) and its own input column. [http://www.datastuff.tech/machine-learning/lstm-how-to-train-neural-networks-to-write-like-lovecraft/ LSTM: How To Train Neural Networks To Write Like Lovecraft \| Strikingloo]

−	To combat the vanishing / exploding gradient problem by introducing gates and an explicitly defined [[memory]] cell. These are inspired mostly by circuitry, not so much biology. Each neuron has a [[memory]] cell and three gates: input, output and forget. The function of these gates is to safeguard the information by stopping or allowing the flow of it. The input gate determines how much of the information from the previous layer gets stored in the cell. The output layer takes the job on the other end and determines how much of the next layer gets to know about the state of this cell. The forget gate seems like an odd inclusion at first but sometimes it’s good to forget: if it’s learning a book and a new chapter begins, it may be necessary for the network to forget some characters from the previous chapter. LSTMs have been shown to be able to learn complex sequences, such as writing like Shakespeare or composing primitive music. Note that each of these gates has a [[Activation Functions#Weights\|weight]] to a cell in the previous neuron, so they typically require more resources to run. Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9.8 (1997): 1735-1780.	+	To combat the vanishing / exploding gradient problem by introducing gates and an explicitly defined [[memory]] cell. These are inspired mostly by circuitry, not so much biology. Each neuron has a [[memory]] cell and three gates: input, output and forget. The function of these gates is to safeguard the information by stopping or allowing the flow of it. The input gate determines how much of the information from the previous layer gets stored in the cell. The output layer takes the job on the other end and determines how much of the next layer gets to know about the state of this cell. The forget gate seems like an odd inclusion at first but sometimes it’s good to forget: if it’s learning a book and a new chapter begins, it may be necessary for the network to forget some characters from the previous chapter. LSTMs have been shown to be able to learn complex sequences, such as writing like Shakespeare or composing primitive music. Note that each of these gates has a [[Activation Functions#Weights\|weight]] to a cell in the previous neuron, so they typically require more resources to run. Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term [[memory]].” Neural computation 9.8 (1997): 1735-1780.

← Older revision		Revision as of 03:19, 2 March 2024
Line 29:		Line 29:


−	A LSTM (Long Short-term Memory) Neural Network is just another kind of Artificial Neural Network, which falls in the category of [[Recurrent Neural Network (RNN)]]. What makes LSTM Neural Networks different from regular Neural Networks is, they have LSTM cells as neurons in some of their layers. Much like Convolutional Layers help a [[(Deep) Convolutional Neural Network (DCNN/CNN)]] learn about image features, LSTM cells help the Network learn about temporal data, something which other Machine Learning models traditionally struggled with. ... Each LSTM cell in our Neural Network will only look at a single column of its inputs, and also at the previous column’s LSTM cell’s output. Normally, we feed our LSTM Neural Network a whole matrix as its input, where each column corresponds to something that “comes before” the next column. This way, each LSTM cell will have two different input vectors: the previous LSTM cell’s output (which gives it some information about the previous input column) and its own input column. [http://www.datastuff.tech/machine-learning/lstm-how-to-train-neural-networks-to-write-like-lovecraft/ LSTM: How To Train Neural Networks To Write Like Lovecraft \| Strikingloo]	+	A LSTM (Long Short-term [[Memory]]) Neural Network is just another kind of Artificial Neural Network, which falls in the category of [[Recurrent Neural Network (RNN)]]. What makes LSTM Neural Networks different from regular Neural Networks is, they have LSTM cells as neurons in some of their layers. Much like Convolutional Layers help a [[(Deep) Convolutional Neural Network (DCNN/CNN)]] learn about image features, LSTM cells help the Network learn about temporal data, something which other Machine Learning models traditionally struggled with. ... Each LSTM cell in our Neural Network will only look at a single column of its inputs, and also at the previous column’s LSTM cell’s output. Normally, we feed our LSTM Neural Network a whole matrix as its input, where each column corresponds to something that “comes before” the next column. This way, each LSTM cell will have two different input vectors: the previous LSTM cell’s output (which gives it some information about the previous input column) and its own input column. [http://www.datastuff.tech/machine-learning/lstm-how-to-train-neural-networks-to-write-like-lovecraft/ LSTM: How To Train Neural Networks To Write Like Lovecraft \| Strikingloo]

−	To combat the vanishing / exploding gradient problem by introducing gates and an explicitly defined memory cell. These are inspired mostly by circuitry, not so much biology. Each neuron has a memory cell and three gates: input, output and forget. The function of these gates is to safeguard the information by stopping or allowing the flow of it. The input gate determines how much of the information from the previous layer gets stored in the cell. The output layer takes the job on the other end and determines how much of the next layer gets to know about the state of this cell. The forget gate seems like an odd inclusion at first but sometimes it’s good to forget: if it’s learning a book and a new chapter begins, it may be necessary for the network to forget some characters from the previous chapter. LSTMs have been shown to be able to learn complex sequences, such as writing like Shakespeare or composing primitive music. Note that each of these gates has a [[Activation Functions#Weights\|weight]] to a cell in the previous neuron, so they typically require more resources to run. Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9.8 (1997): 1735-1780.	+	To combat the vanishing / exploding gradient problem by introducing gates and an explicitly defined [[memory]] cell. These are inspired mostly by circuitry, not so much biology. Each neuron has a [[memory]] cell and three gates: input, output and forget. The function of these gates is to safeguard the information by stopping or allowing the flow of it. The input gate determines how much of the information from the previous layer gets stored in the cell. The output layer takes the job on the other end and determines how much of the next layer gets to know about the state of this cell. The forget gate seems like an odd inclusion at first but sometimes it’s good to forget: if it’s learning a book and a new chapter begins, it may be necessary for the network to forget some characters from the previous chapter. LSTMs have been shown to be able to learn complex sequences, such as writing like Shakespeare or composing primitive music. Note that each of these gates has a [[Activation Functions#Weights\|weight]] to a cell in the previous neuron, so they typically require more resources to run. Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9.8 (1997): 1735-1780.

← Older revision		Revision as of 16:55, 6 August 2023
Line 31:		Line 31:
	A LSTM (Long Short-term Memory) Neural Network is just another kind of Artificial Neural Network, which falls in the category of [[Recurrent Neural Network (RNN)]]. What makes LSTM Neural Networks different from regular Neural Networks is, they have LSTM cells as neurons in some of their layers. Much like Convolutional Layers help a [[(Deep) Convolutional Neural Network (DCNN/CNN)]] learn about image features, LSTM cells help the Network learn about temporal data, something which other Machine Learning models traditionally struggled with. ... Each LSTM cell in our Neural Network will only look at a single column of its inputs, and also at the previous column’s LSTM cell’s output. Normally, we feed our LSTM Neural Network a whole matrix as its input, where each column corresponds to something that “comes before” the next column. This way, each LSTM cell will have two different input vectors: the previous LSTM cell’s output (which gives it some information about the previous input column) and its own input column. [http://www.datastuff.tech/machine-learning/lstm-how-to-train-neural-networks-to-write-like-lovecraft/ LSTM: How To Train Neural Networks To Write Like Lovecraft \| Strikingloo]		A LSTM (Long Short-term Memory) Neural Network is just another kind of Artificial Neural Network, which falls in the category of [[Recurrent Neural Network (RNN)]]. What makes LSTM Neural Networks different from regular Neural Networks is, they have LSTM cells as neurons in some of their layers. Much like Convolutional Layers help a [[(Deep) Convolutional Neural Network (DCNN/CNN)]] learn about image features, LSTM cells help the Network learn about temporal data, something which other Machine Learning models traditionally struggled with. ... Each LSTM cell in our Neural Network will only look at a single column of its inputs, and also at the previous column’s LSTM cell’s output. Normally, we feed our LSTM Neural Network a whole matrix as its input, where each column corresponds to something that “comes before” the next column. This way, each LSTM cell will have two different input vectors: the previous LSTM cell’s output (which gives it some information about the previous input column) and its own input column. [http://www.datastuff.tech/machine-learning/lstm-how-to-train-neural-networks-to-write-like-lovecraft/ LSTM: How To Train Neural Networks To Write Like Lovecraft \| Strikingloo]

−	To combat the vanishing / exploding gradient problem by introducing gates and an explicitly defined memory cell. These are inspired mostly by circuitry, not so much biology. Each neuron has a memory cell and three gates: input, output and forget. The function of these gates is to safeguard the information by stopping or allowing the flow of it. The input gate determines how much of the information from the previous layer gets stored in the cell. The output layer takes the job on the other end and determines how much of the next layer gets to know about the state of this cell. The forget gate seems like an odd inclusion at first but sometimes it’s good to forget: if it’s learning a book and a new chapter begins, it may be necessary for the network to forget some characters from the previous chapter. LSTMs have been shown to be able to learn complex sequences, such as writing like Shakespeare or composing primitive music. Note that each of these gates has a weight to a cell in the previous neuron, so they typically require more resources to run. Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9.8 (1997): 1735-1780.	+	To combat the vanishing / exploding gradient problem by introducing gates and an explicitly defined memory cell. These are inspired mostly by circuitry, not so much biology. Each neuron has a memory cell and three gates: input, output and forget. The function of these gates is to safeguard the information by stopping or allowing the flow of it. The input gate determines how much of the information from the previous layer gets stored in the cell. The output layer takes the job on the other end and determines how much of the next layer gets to know about the state of this cell. The forget gate seems like an odd inclusion at first but sometimes it’s good to forget: if it’s learning a book and a new chapter begins, it may be necessary for the network to forget some characters from the previous chapter. LSTMs have been shown to be able to learn complex sequences, such as writing like Shakespeare or composing primitive music. Note that each of these gates has a [[Activation Functions#Weights\|weight]] to a cell in the previous neuron, so they typically require more resources to run. Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9.8 (1997): 1735-1780.

@@ Line 16: / Line 16: @@
 ** [[Average-Stochastic Gradient Descent (SGD) Weight-Dropped LSTM (AWD-LSTM)]]
 ** [[Hopfield Network (HN)]]
-* [[Attention]] Mechanism  ...[[Transformer]] Model   ...[[Generative Pre-trained Transformer (GPT)]]
+* [[Attention]] Mechanism  ...[[Transformer]] ...[[Generative Pre-trained Transformer (GPT)]] ... [[Generative Adversarial Network (GAN)|GAN]] ... [[Bidirectional Encoder Representations from Transformers (BERT)|BERT]]
 * [https://www.technologyreview.com/2023/02/08/1068068/chatgpt-is-everywhere-heres-where-it-came-from/ ChatGPT is everywhere. Here’s where it came from | Will Douglas Heaven - MIT Technology Review]
 ** [[ChatGPT]] | [[OpenAI]]

@@ Line 16: / Line 16: @@
 ** [[Average-Stochastic Gradient Descent (SGD) Weight-Dropped LSTM (AWD-LSTM)]]
 ** [[Hopfield Network (HN)]]
 * [https://www.technologyreview.com/2023/02/08/1068068/chatgpt-is-everywhere-heres-where-it-came-from/ ChatGPT is everywhere. Here’s where it came from | Will Douglas Heaven - MIT Technology Review]
 ** [[ChatGPT]] | [[OpenAI]]

@@ Line 17: / Line 17: @@
 ** [[Hopfield Network (HN)]]
 * [https://www.technologyreview.com/2023/02/08/1068068/chatgpt-is-everywhere-heres-where-it-came-from/ ChatGPT is everywhere. Here’s where it came from | Will Douglas Heaven - MIT Technology Review]
-** [[Sequence to Sequence (Seq2Seq)]]
+** [[ChatGPT]] | [[OpenAI]]
 * [http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Andrej Karpathy blog]
 * [http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Understanding LSTM Networks | Christopher Olah]

@@ Line 20: / Line 20: @@
 ** [[Recurrent Neural Network (RNN)]]
 ** [[Long Short-Term Memory (LSTM)]]
 ** [[Bidirectional Encoder Representations from Transformers (BERT)]]  ... a better model, but less investment than the larger [[OpenAI]] organization
 ** [[ChatGPT]] | [[OpenAI]]:
-*** [[Attention]] Mechanism/[[Transformer]] Model
+*** [[Transformer]] / [[Attention]] Mechanism
 *** [[Generative Pre-trained Transformer (GPT)]]
 *** [[Reinforcement Learning (RL) from Human Feedback (RLHF)]]
 *** [[Supervised]] Learning
-*** [[Proximal Policy Optimization (PPO)]]
+*** [[Proximal Policy Optimization (PPO)]]]
 * [http://karpathy.github.io/2015/05/21/rnn-effectiveness/ Andrej Karpathy blog]
 * [http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Understanding LSTM Networks | Christopher Olah]

@@ Line 11: / Line 11: @@
 * [http://deeplearning4j.org/lstm.html A Beginner's Guide to LSTMs]
 * [[Recurrent Neural Network (RNN)]] Variants:
 ** [[Gated Recurrent Unit (GRU)]]
 ** [[Bidirectional Long Short-Term Memory (BI-LSTM)]]