Difference between revisions of "Markov Decision Process (MDP)"

From
Jump to: navigation, search
m
m
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
[http://www.youtube.com/results?search_query=deep+reinforcement+q+learning+artificial+intelligence+ Youtube search...]
+
{{#seo:
 +
|title=PRIMO.ai
 +
|titlemode=append
 +
|keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools 
  
* [[Deep Reinforcement Learning (DRL)]]
+
<!-- Google tag (gtag.js) -->
* [[Markov Model (Chain, Discrete Time, Continuous Tme, Hidden)]]
+
<script async src="https://www.googletagmanager.com/gtag/js?id=G-4GCWLBVJ7T"></script>
 +
<script>
 +
  window.dataLayer = window.dataLayer || [];
 +
  function gtag(){dataLayer.push(arguments);}
 +
  gtag('js', new Date());
  
https://upload.wikimedia.org/wikipedia/commons/thumb/a/ad/Markov_Decision_Process.svg/600px-Markov_Decision_Process.svg.png
+
  gtag('config', 'G-4GCWLBVJ7T');
 +
</script>
 +
}}
 +
[http://www.youtube.com/results?search_query=Markov+Decision+Process+MDP Youtube search...]
 +
[http://www.google.com/search?q=Markov+Decision+Process+MDP+machine+learning+ML+artificial+intelligence ...Google search]
 +
 
 +
* [[Markov Model (Chain, Discrete Time, Continuous Time, Hidden)]]
 +
 
 +
* [[Reinforcement Learning (RL)]]
 +
** [[Monte Carlo]] (MC) Method - Model Free Reinforcement Learning
 +
** [[Markov Decision Process (MDP)]]
 +
** [[State-Action-Reward-State-Action (SARSA)]]
 +
** [[Q Learning]]
 +
*** [[Deep Q Network (DQN)]]
 +
** [[Deep Reinforcement Learning (DRL)]] DeepRL
 +
** [[Distributed Deep Reinforcement Learning (DDRL)]]
 +
** [[Symbiotic Intelligence]] ... [[Bio-inspired Computing]] ... [[Neuroscience]] ... [[Connecting Brains]] ... [[Nanobots#Brain Interface using AI and Nanobots|Nanobots]] ... [[Molecular Artificial Intelligence (AI)|Molecular]] ... [[Neuromorphic Computing|Neuromorphic]] ... [[Evolutionary Computation / Genetic Algorithms| Evolutionary/Genetic]]
 +
** [[Actor Critic]]
 +
*** [[Asynchronous Advantage Actor Critic (A3C)]]
 +
*** [[Advanced Actor Critic (A2C)]]
 +
*** [[Lifelong Latent Actor-Critic (LILAC)]]
 +
** [[Hierarchical Reinforcement Learning (HRL)]]
 +
 
 +
 
 +
http://miro.medium.com/max/1200/1*mUyxMUpzQWX4GNTd7TT4nA.gif
 +
 
 +
http://upload.wikimedia.org/wikipedia/commons/thumb/a/ad/Markov_Decision_Process.svg/600px-Markov_Decision_Process.svg.png
  
 
Solutions:
 
Solutions:
 
* [http://www.google.com/search?q=Dynamic+Programming+reinforcement+learning&oq=Dynamic+Programming+reinforcement+learning Dynamic Programming]
 
* [http://www.google.com/search?q=Dynamic+Programming+reinforcement+learning&oq=Dynamic+Programming+reinforcement+learning Dynamic Programming]
* [http://www.google.com/search?ei=CpMKW-TXNMbWzgLdhJqIAQ&q=monte+carlo+reinforcement+learning&oq=monte+carlo+reinforcement+learning Monte Carlo]  
+
* [[Monte Carlo]]
 
* [http://www.google.com/search?ei=NJMKW97aLof_zgKM8KSgBA&q=Temporal+Difference+reinforcement+learning Difference Learning]
 
* [http://www.google.com/search?ei=NJMKW97aLof_zgKM8KSgBA&q=Temporal+Difference+reinforcement+learning Difference Learning]
  
 
Used where outcomes are partly random and partly under the control of a decision maker. MDP is a discrete time stochastic control process. At each time step, the process is in some state s, and the decision maker may choose any action a that is available in state s. The process responds at the next time step by randomly moving into a new state  s', and giving the decision maker a corresponding reward R_{a}(s,s')} R_a(s,s').  The probability that the process moves into its new state s' is influenced by the chosen action.  Helping the convergence of certain algorithms a discount rate (factor) makes an infinite sum finite.
 
Used where outcomes are partly random and partly under the control of a decision maker. MDP is a discrete time stochastic control process. At each time step, the process is in some state s, and the decision maker may choose any action a that is available in state s. The process responds at the next time step by randomly moving into a new state  s', and giving the decision maker a corresponding reward R_{a}(s,s')} R_a(s,s').  The probability that the process moves into its new state s' is influenced by the chosen action.  Helping the convergence of certain algorithms a discount rate (factor) makes an infinite sum finite.
  
<youtube>23FW_vsuETg</youtube>
+
 
 +
<youtube>my207WNoeyA</youtube>
 
<youtube>jpmZp3eX-wI</youtube>
 
<youtube>jpmZp3eX-wI</youtube>
 
<youtube>EqUfuT3CC8s</youtube>
 
<youtube>EqUfuT3CC8s</youtube>
Line 21: Line 55:
 
<youtube>Csiiv6WGzKM</youtube>
 
<youtube>Csiiv6WGzKM</youtube>
 
<youtube>tO6hTI8CXaM</youtube>
 
<youtube>tO6hTI8CXaM</youtube>
 +
<youtube>i0o-ui1N35U</youtube>
 +
<youtube>9g32v7bK3Co</youtube>
 +
<youtube>PYQAI6Td2wo</youtube>
 +
 +
 +
== (Richard) Bellman Equation ==
 +
* [https://towardsdatascience.com/introduction-to-reinforcement-learning-markov-decision-process-44c533ebf8da Reinforcement Learning : Markov-Decision Process (Part 1) | Ayush Singh - Towards Data Science]
 +
* [http://towardsdatascience.com/reinforcement-learning-markov-decision-process-part-2-96837c936ec3 Reinforcement Learning: Bellman Equation and Optimality (Part 2) | Ayush Singh - Towards Data Science]
 +
 +
http://miro.medium.com/max/690/1*5PGCR0jwd15kLhRCA09R1w.gif
 +
 +
<youtube>14BfO5lMiuk</youtube>
 +
<youtube>aNuOLwojyfg</youtube>

Latest revision as of 20:27, 13 July 2023

Youtube search... ...Google search


1*mUyxMUpzQWX4GNTd7TT4nA.gif

600px-Markov_Decision_Process.svg.png

Solutions:

Used where outcomes are partly random and partly under the control of a decision maker. MDP is a discrete time stochastic control process. At each time step, the process is in some state s, and the decision maker may choose any action a that is available in state s. The process responds at the next time step by randomly moving into a new state s', and giving the decision maker a corresponding reward R_{a}(s,s')} R_a(s,s'). The probability that the process moves into its new state s' is influenced by the chosen action. Helping the convergence of certain algorithms a discount rate (factor) makes an infinite sum finite.



(Richard) Bellman Equation

1*5PGCR0jwd15kLhRCA09R1w.gif