Folk-Wisdom’s Fallacy | The Solution to Complex Real-World Problems?

Supervised algorithms (i.e. random forests) and unsupervised algorithms (i.e. K-means clustering) require some training before they can be used to perform tasks. Supervised algorithms need pre-classified data, whereas unsupervised algorithms can identify patterns from the input data itself. Therefore, none of the more famous algorithms can truly “self-learn”, which is where reinforcement learning comes in. Reinforcement learning is the closest we can get to self-learning in the world of machine learning, as it finds patterns by trial and error.

Reinforcement learning, as the name suggests, is a set of learning algorithms where the AI model tries to understand its environment by recording the feedback it gets for different actions it performs. After recording the action-feedback data, the algorithm determines which action is the most desirable at each step.

Figure 2: Reinforcement learning algorithm

[Source: Wikipedia.org]

The best example of a reinforcement learning system is Google’s AlphaGo, a program that plays the board game Go. Like chess, Go has a set of rules and a pre-defined victory condition. The rules and victory condition are introduced to the reinforcement learning algorithms with the help of a reward function that tells the model if the action it took was desirable or not. Over thousands of game sessions with both humans and computers, the AlphaGo model became so good that one of the best Go players in the world, Lee Sedol, retired. He stated that he could never be the top Go player because of AI and referring to AI as “an entity that cannot be defeated”.

The success of AlphaGo might lead you to believe that reinforcement learning can perform well with much less effort, as we do not have to train it. But all that glitters is not gold.

Reinforcement learning systems need a well-defined reward function. The reward function is one of the most important parts of a reinforcement learning system. If it is not well-defined, taking into consideration all the aspects of the environment, the system might never converge – or, if it does, it will take a very long time. Games like Go and chess have well defined rules and victory conditions, which is why it is relatively easy to create reinforcement learning models for them that do well. But consider the case of a self-driving car, where the conditions are endless; the reward function must take into account myriads of variables, such as fuel, temperature, road conditions, time, traffic, pedestrians, and many others.

Reinforcement algorithms need time and the ability to make mistakes to learn. Even if we can define the perfect reward function for our problem, the reinforcement learning model needs to perform trial-and-error runs to be able to learn. For a lot of real-world problems, that is not possible. For example, if we wish to create a reinforcement learning model that finds the best marketing campaign for each customer, we cannot train it on real customers. The trial-and-error runs would cost us thousands of dollars and lead to customer churn. So, we need to create a simulation environment that mimics the real-world environment well. Then we can train our model using this simulation environment. Creating such a simulation for most problems is difficult, if not impossible.

References

Deepmind.com: AlphaGo case study https://deepmind.com/research/case-studies/alphago-the-story-so-far
Wikipedia.org: Lee Sedol https://en.wikipedia.org/wiki/Lee_Sedol

– Authored by Rohan Chopra, Data Scientist at Absolutdata