Reinforcement Learning

Reinforcement Learning is used to teaches a machine or agent based on trial and error method by using input from environment, and reward and punishment based on feedback(I.e generated output’s quality). In reinforcement learning, a learning system called an agent can perceives the environment, performs some actions, and gets rewarded or penalized depending on how it is performing. The main goal of the agent is to accumulate as much as rewards as possible. In order to maximize reward. the agent learns the best strategy(policy) necessary.

ℹ️ Reinforcement Learning doesn’t have a training dataset and is utilized in solving interactive problems.

Types of algorithms:

  • Value-Based Algorithms: In value-based algorithms, the agent tries to learn the value of each state or state-action pair by calculating their expected reward. Given this value, the agent selects the best possible action according to its policy. Examples of algorithms for value-based methods are Q-learning, SARSA, and Deep Q-Networks (DQN).
  • Policy-Based Algorithms: Policy-based RL algorithms learn the optimal policy or action-selection rules by directly optimizing the policy. These algorithms define a policy that maps each state to the best possible action based on the expected reward. An example of policy-based methods is Stochastic Policy Gradient.
  • Model-Based Algorithms: Model-based reinforcement learning algorithms try to learn a model for the environment (states, actions) and approximate the rules that govern the learning. These algorithms utilize this information to identify the ideal execution plan to act in the environment. An example of model-based methods is Dyna-Q.
  • Model-Free Algorithms: In model-free algorithms, the agent utilizes the trial and error process to learn without first creating a model of the environment. Instead, these methods work to update the policy or the expected values directly on the interactions with the environment. Examples of model-free methods are SARSA as well as the Deep Deterministic Policy Gradient (DDPG) algorithm.


  • Robotics
  • Game AI
  • Resource management