# Q Learning

Youtube search... ...Google search

- Q Learning | Wikipedia
- Model Free Reinforcement learning algorithms (Monte Carlo, SARSA, Q-learning) | Madhu Sanjeevi (Mady) - Medium
- Artificial Intelligence (AI) ... Machine Learning (ML) ... Deep Learning ... Neural Network ... Reinforcement ... Learning Techniques
- Monte Carlo (MC) Method - Model Free Reinforcement Learning
- Markov Decision Process (MDP)
- State-Action-Reward-State-Action (SARSA)
- Q Learning
- Deep Reinforcement Learning (DRL) DeepRL
- Distributed Deep Reinforcement Learning (DDRL)
- Evolutionary Computation / Genetic Algorithms
- Actor Critic
- Hierarchical Reinforcement Learning (HRL)

- Gaming

When feedback is provided, it might be long time after the fateful decision has been made. In reality, the feedback is likely to be the result of a large number of prior decisions, taken amid a shifting, uncertain environment. Unlike supervised learning, there are no correct input/output pairs, so suboptimal actions are not explicitly corrected, wrong actions just decrease the corresponding value in the Q-table, meaning there’s less chance choosing the same action should the same state be encountered again. Quora | Jaron Collis

- Learning Rate: The learning rate or step size determines to what extent newly acquired information overrides old information. A factor of 0 makes the agent learn nothing (exclusively exploiting prior knowledge), while a factor of 1 makes the agent consider only the most recent information (ignoring prior knowledge to explore possibilities).

- Discount factor: The discount factor {\displaystyle \gamma } \gamma determines the importance of future rewards. A factor of 0 will make the agent "myopic" (or short-sighted) by only considering current rewards, i.e. {\displaystyle r_{t}} r_{t} (in the update rule above), while a factor approaching 1 will make it strive for a long-term high reward. If the discount factor meets or exceeds 1, the action values may diverge.

- Initial conditions (Q0): Since Q-learning is an iterative algorithm, it implicitly assumes an initial condition before the first update occurs. High initial values, also known as "optimistic initial conditions",[7] can encourage exploration: no matter what action is selected, the update rule will cause it to have lower values than the other alternative, thus increasing their choice probability. The first reward {\displaystyle r} r can be used to reset the initial conditions.

# What is Q Learning

What is Q-learning? Q-learning is a machine learning approach that enables a model to iteratively learn and improve over time by taking the correct action. Q-learning is a type of reinforcement learning. With reinforcement learning, a machine learning model is trained to mimic the way animals or children learn. Good actions are rewarded or reinforced, while bad actions are discouraged and penalized. With the state-action-reward-state-action form of reinforcement learning, the training regimen follows a model to take the right actions. Q-learning provides a model-free approach to reinforcement learning. There is no model of the environment to guide the reinforcement learning process. The agent -- which is the AI component that acts in the environment -- iteratively learns and makes predictions about the environment on its own. Q-learning also takes an off-policy approach to reinforcement learning. A Q-learning approach aims to determine the optimal action based on its current state. The Q-learning approach can accomplish this by either developing its own set of rules or deviating from the prescribed policy. Because Q-learning may deviate from the given policy, a defined policy is not needed. Off-policy approach in Q-learning is achieved using Q-values -- also known as action values. The Q-values are the expected future values for action and are stored in the Q-table. Chris Watkins first discussed the foundations of Q-learning in a 1989 thesis for Cambridge University and further elaborated in a 1992 publication titled Q-learning.

How does Q-learning work?

Q-learning models operate in an iterative process that involves multiple components working together to help train a model. The iterative process involves the agent learning by exploring the environment and updating the model as the exploration continues. The multiple components of Q-learning include the following:

- Agents. The agent is the entity that acts and operates within an environment.
- States. The state is a variable that identifies the current position in an environment of an agent.
- Actions. The action is the agent's operation when it is in a specific state.
- Rewards. A foundational concept within reinforcement learning is the concept of providing either a positive or a negative response for the agent's actions.
- Episodes. An episode is when an agent can no longer take a new action and ends up terminating.
- Q-values. The Q-value is the metric used to measure an action at a particular state.

Here are the two methods to determine the Q-value:

- Temporal difference. The temporal difference formula calculates the Q-value by incorporating the value of the current state and action by comparing the differences with the previous state and action.
- Bellman's equation. Mathematician Richard Bellman invented this equation in 1957 as a recursive formula for optimal decision-making. In the q-learning context, Bellman's equation is used to help calculate the value of a given state and assess its relative position. The state with the highest value is considered the optimal state.

Q-learning models work through trial-and-error experiences to learn the optimal behavior for a task. The Q-learning process involves modeling optimal behavior by learning an optimal action value function or q-function. This function represents the optimal long-term value of action a in state s and subsequently follows optimal behavior in every subsequent state.

# Q Learning for Gaming

- Gaming ... Game-Based Learning (GBL) ... Security ... Generative AI ... Games - Metaverse ... Quantum ... Game Theory ... Design
- Teaching a Neural Network to play a game using Q-learning
- How to teach AI to play Games: Deep Reinforcement Learning
- Q Learning Applied To a Two Player Game | Stack Overflow

Q-learning can be applied to gaming by teaching a neural network to play a game using Q-learning¹. The Q-learning algorithm uses a Q-table to look up the optimal next action based on the current state of the game. However, as the complexity of the game grows, so does the Q-table. One alternative approach is to replace the table lookup with a neural network that takes a state and an action as input and outputs the Q-value, which represents the possible reward for taking that action at that state. In Q-learning, the agent uses a Q-table to take the best possible action based on the expected reward for each state in the environment. A Q-table is a data structure of sets of actions and states, and the Q-learning algorithm is used to update the values in the table1. The Q-table score for each state-action pair represents the maximum expected future reward that the agent will receive if it takes that action at that state.