The pathfinding problem is defined as finding the shortest route between two points.  Pathfinding is a widely researched area and plays a key role in many computer science applications. Traditional algorithms that solve this problem try to achieve a single objective and are mostly static, operating within a defined, deterministic environment.
However, such algorithms become inapplicable in cases of multiple objectives; they cannot rely on any previous knowledge of the environment. Reinforcement learning (RL) algorithms usually try to overcome these problems by following the Markov Decision Process (MDP).
Reinforcement learning is a branch of machine learning in which a learning agent operating an uncertain and potentially complex environment learns to reach to a goal and gets penalties/rewards along the way.  The goal of the learning agent is to automatically determine the ideal behavior at each step and to learn optimal policy so that the overall reward is maximized.
A reinforcement learning problem contains various technical components, such as:
- States, which are visitable positions.
- Actions, which are possible movements.
- Transitions, which specify the connections between states via actions. Thus, knowing the transitions means knowing the map.
- Rewards that specify the evaluation of that action.
- A policy specifies an action the agent takes at each state. Thus, a policy also specifies a trajectory, which is a series of states and actions that the agent takes from a start state to an end state.
Figure 2 shows using a 5×5 grid deterministic environment:
The white box represents the learning agent at the initial state, the red boxes are obstacles, the green circle is the goal, and the yellow box represents the adversary.
The objectives for the learning agent should be:
- Reach the goal and avoid obstacles.
- Minimize the number of steps to reach to the goal.
- Avoid the adversary (in the top right corner of the grid).
- Minimize the time required at each goal state.
- Minimize the amount of fuel required to reach the goal.
This problem can be solved using different voting-based q-learning algorithms given in Tozer, Mazzuchi, and Sarkani’s Many-Objective Stochastic Path Finding Using Reinforcement Learning. (See the references below for a link). The different types of reinforcement learning algorithms can be compared, specific to the problem type and environment.
– Authored By Rohan Garg ,Data Scientist at Absolutdata