Coder’s Cauldron | Deep Reinforcement Learning

Deep reinforcement learning is a subset of machine learning that combines reinforcement learning and deep learning. It adds deep learning into the reinforcement learning process, thus giving agents the ability to make decisions from unstructured input data. It can take very large inputs like images and decide what actions to perform to optimize an objective. It has been used for applications in video games, computer vision, transportation, education, healthcare, and many more.

Source: AAAMinds.com

Types of Reinforcement Learning

Value-Based Methods

These methods learn the state or state-action values and act by choosing the best action in the state. Here exploration is necessary. In general, Deep-Q-Network (DQN) follows the following steps:

The first step generally includes image pre-processing like converting the image to grayscale and cropping the unnecessary parts
Run the resultant image through a CNN to extract the features that can help the agent make decisions
Q-Values for all possible action are generated
Find the most accurate Q-values using back-propagation

Policy-Based Methods

In this, the policy function is learned directly, without calculating value functions for each action. Policy gradient is one such example of a policy-based algorithm. It works as follows:

Take state as input and calculate the probability of all actions based on past runs
Select the action with highest probability
Repeat the above process until the episode ends and evaluate the total rewards
Using back-propagation update the model parameters based on the rewards

Source: OpenAI

Code-Based Tutorial For DQN

Problem Statement: A pole is attached to a cart by an un-actuated joint. The cart moves along a frictionless surface. The pendulum starts vertically, we need to keep it in the upright position by increasing or reducing the cart’s velocity. (CartPole-v0 task)

Notebook Link: Google Colab

References

For references and new developments in deep reinforcement learning please check out the links below:

-Authored by Arnav Agarwal, Data Scientist at Absolutdata