What is Reinforcement Learning? How does it work without human intervention?

mar 01,2018

The human capability of learning new skills has always been intriguing the science world. The new age world of Artificial Intelligence brought all the attention back to the same question. Today scientists know, how the humans behave and react, the same has been programmed into the AI-powered machines.

Effectively the predictive behavior has already been replicated, but the skill learning which is non-predictive posed challenges. After extensive human brain and behavior study, it has been deduced that we learn from the interaction with the environment. Intelligence and learning increase only when we interact. An infant learns to talk when he interacts with the environment and its elements. Science is now working to make the machine capable of working through the problems on its own rather than human intervention.

What is Reinforcement Learning?

Reinforcement Learning is a segment of Artificial Intelligence that focuses on learning and take the autonomous decision based on the environmental factors. It is an ecosystem that urges the machine to take a decision based on previous experience rather than the current training data.

Reinforcement Learning has a clear objective, it wants to make a machine capable of taking decisions on its own and learn just like humans. The algorithms are being developed for a wide range of use cases.

Reinforcement learning is beyond the traditional Artificial intelligence. The machines are now capable of taking an optimal decision in an autonomous fashion. After extensive trial-and-error processes and reward-punishment cycles, agents learn a skill and are made capable of taking decisions in future. The rewards are offered for a correct action and punishment for an incorrect decision. Reinforcement learning (RL) is finding immense potential in sectors like healthcare, e-commerce, transportation, and finance. Taking the AI even closer to human behavior, RL is simulating the human skill enhancing the capacity of machines.

Be ready to meet a new generation of Robots, digital assistants, Chatbots and more soon.

Reinforcement Learning Technique

Reinforcement Learning focuses on maximizing the rewards when the agent performs a task. There is no human intervention or pre-set training data to help machine make a decision. It has to take an action by interacting with the environment and picking up the decision path that will bring optimum reward signals. A wrong decision will lead to a punishment signal making the machine capable of avoiding that decision in case of same environmental facts.

Let us first understand how a human learns a new skill?

  • 1. The first step while deep learning a new skill is to know what is happening in the environment around. For example, when a child learns to talk, he takes a notice of how the tongue and lips are moved to make sounds.

  • 2. When he tries to make sounds, he faces the challenges and as he overcomes them, he is rewarded with a sound being uttered. A child then remembers over a period of trial and error as to what works to make a sound he intends to make.

  • 3. He keeps on improvising the skills by interacting with the environment and the response to the sounds he makes.


What might look like a simple task is a problem for a child for which he has to understand the environment, keep a check on the environmental response and keep learning from the reward system? Reinforcement Learning works the same way. The machine powered with reinforcement algorithm learns a skill just like a child, understanding the environment and building its response techniques by interacting with the environment.

How is Reinforcement machine learning different from other learning techniques?

Reinforcement Learning is an enhanced level of machine learning. Till date, the focus had been on making the machine behave like humans but with reinforcement learning, the learning capability of humans is being imbibed into the machines. Understanding how the algorithms developed for Reinforcement Learning are different from previous algorithms, help understand the differences.

Source: Towards Data Science

Supervised Algorithms and Reinforcement Learning algorithms

Reinforcement algorithms are programmed in a way that the machine does not need any external supervision or human intervention to make a decision and act. For the Supervised algorithm, an external element guides the agent with the inputs from the environment. The inputs help agents complete the task in a correct manner. But there are some problems in which there are so many combinations of subtasks that the agent can perform to achieve the objective. A supervisor is the set of environmental conditions that can happen and when the incident matches, it guides the machine to act in a particular fashion. The problem arises when the set of input conditions is too large, this brings up the need for a machine algorithm that can understand which input must it pick and work on to take the output action. Reinforcement algorithm is the solution to the supervisor problem that is a bottleneck for Supervised Algorithm. A reinforcement learning algorithm interacts with the environment in a very humanly manner to make a decision that is correct.

Unsupervised Algorithm and Reinforcement Learning Algorithm

Unsupervised Algorithms focus on understanding the patterns of activities happening in the environment. There is no one to one mapping like reinforcement Learning. The unsupervised algorithms give outputs based on the patterns rather than a single input. Reinforcement learning works on a trial and error procedure to deduce the exact output and maps it to the input for future decision making.

Semi-supervised Algorithm and Reinforcement Learning Algorithm

Semi-Supervised Algorithm is a mix of pattern understanding and input from the set of inputs offered by supervisor. The Reinforcement Learning is different from it as it maps the input to the output with a feedback mechanism that brings output quite close to correct and expected results.

Source: IBM

What are rCNN?

rCNN is the aspect of Artificial Learning that focuses on Object Detection. An image is studied and its objects identified and are classified as per the groups via a bounding box.

Inputs to the rCNN Algorithm is an Image and the outputs are the bounding boxes and the labels that identify them.

R-CNN develops the region proposals. These are also known as bounding boxes and they search the objects in the image by selective search techniques. The objects in the image are identified by the texture, size, color and more such elements. Bounding boxes are created and they are passed through a confirmation process.

Bounding Boxes are the core of R-CNN. Improving Bounding Boxes help in better identification of objects. Once the object is identified, the box is tightened to fit the actual dimensions of the object. A regression model is used to tighten the boxes. In all the R-CNN works by generating a set of proposals used to determine the exact bounding boxes. AlexNet is used with pre-set conditions to determine the object identification. The coordinates of the box are tightened over a linear regression model.


What is time series?

The historical data of an aspect is sequentially indexed and represented to help develop data that can solve and segment problems. The future data usually follows patterns and is used to predict future events based on this historical data. Time series method is used to predict sales, inventory, stock prices and more. Smoothing Algorithm is used to train time series models.


What is Bias?

In a neural network, each neuron over the multiple layers is connected to each other. Each of the neurons represents some mathematical calculation. A bias unit is identified as an additional neuron at each layer right before the output layer. This layer stores value of 1. Bias units are not connected to any predecessor layer of the neuron. They do not have any neurons incoming as inputs to them.

Having Bias neurons is important because even if they do not have input, they give outputs that make a significant contribution to the ANN.


How to solve Reinforcement Learning Problems?


Markov Decision Process is the mathematical framework for developing reinforcement learning Algorithm. The elements that govern the behavior of the Decision process are

  • S- Defines the Set of states (Outputs)
  • A- Defines the Set of actions (inputs)
  • R- Defines Reward function
  • π – Defines the Policy.
  • V- Defines the Value


The actions (A) are transitioned into output state (S). The rewards create a feedback mechanism to take the correct action. The actions determine the positive reward or negative reward that guides the algorithm to the correct S state.

The policy (π) is defined by the set of actions (A), which is used in future to determine the course of action for the same set of inputs. Rewards determine the value (V) that is maximized to bring the Set of states(S) to the correct state always.


References: