Difference between revisions of "Deep Q Network (DQN)"

From
Jump to: navigation, search
Line 12: Line 12:
 
* [http://medium.com/deep-math-machine-learning-ai/ch-12-1-model-free-reinforcement-learning-algorithms-monte-carlo-sarsa-q-learning-65267cb8d1b4 Model Free Reinforcement learning algorithms (Monte Carlo, SARSA, Q-learning) | Madhu Sanjeevi (Mady) - Medium]
 
* [http://medium.com/deep-math-machine-learning-ai/ch-12-1-model-free-reinforcement-learning-algorithms-monte-carlo-sarsa-q-learning-65267cb8d1b4 Model Free Reinforcement learning algorithms (Monte Carlo, SARSA, Q-learning) | Madhu Sanjeevi (Mady) - Medium]
 
* [[Gaming]]
 
* [[Gaming]]
* [http://en.wikipedia.org/wiki/Q-learning Wikipedia]
+
* [http://en.wikipedia.org/wiki/Q-learning Q Learning | Wikipedia]
  
 
When feedback is provided, it might be long time after the fateful decision has been made. In reality, the feedback is likely to be the result of a large number of prior decisions, taken amid a shifting, uncertain environment. Unlike supervised learning, there are no correct input/output pairs, so suboptimal actions are not explicitly corrected, wrong actions just decrease the corresponding value in the Q-table, meaning there’s less chance choosing the same action should the same state be encountered again. [http://www.quora.com/How-does-Q-learning-work-1 Quora | Jaron Collis]
 
When feedback is provided, it might be long time after the fateful decision has been made. In reality, the feedback is likely to be the result of a large number of prior decisions, taken amid a shifting, uncertain environment. Unlike supervised learning, there are no correct input/output pairs, so suboptimal actions are not explicitly corrected, wrong actions just decrease the corresponding value in the Q-table, meaning there’s less chance choosing the same action should the same state be encountered again. [http://www.quora.com/How-does-Q-learning-work-1 Quora | Jaron Collis]

Revision as of 23:08, 11 February 2019

Youtube search... ...Google search

When feedback is provided, it might be long time after the fateful decision has been made. In reality, the feedback is likely to be the result of a large number of prior decisions, taken amid a shifting, uncertain environment. Unlike supervised learning, there are no correct input/output pairs, so suboptimal actions are not explicitly corrected, wrong actions just decrease the corresponding value in the Q-table, meaning there’s less chance choosing the same action should the same state be encountered again. Quora | Jaron Collis

Training deep neural networks to show that a novel end-to-end reinforcement learning agent, termed a deep Q-network (DQN) Human-level control through Deep Reinforcement Learning | Deepmind