Close menu
Accessibility Menu
Bigger text
bigger text icon
Text Spacing
Spacing icon
saturation icon
big cursor icon
Dyslexia Friendly
dyslexia icon

Model training through Reinforcement Learning

There are several areas of research in data science, including Reinforcement Learning (RL). With the advance of Deep Learning, large amounts of data are no longer a problem and new training models for algorithms have emerged, such as the aforementioned RL.

This is the third method that has been developed, by which algorithms learn by themselves, after Machine Learning’s supervised learning and unsupervised learning. It is currently attracting considerable interest in the training of industrial robotics.

Model training through Reinforcement Learning

It is based on obtaining rewards for learning a new task, i.e. it consists of training models for decision-making without requiring data for conditioning. Thus, data is generated through a trial-and-error method, where it is marked with a label. During several training phases the algorithm receives reward tags when it performs the correct function. After repeating the experience and verifying the rewards received, it learns by itself.
In short, it is an autonomous learning process whereby it learns the action to be performed when interacting with the environment, receiving error signals or rewards depending on the actions carried out. In other words, the system seeks to find the most efficient decision making that allows it to maximise rewards.


The applications of LR are vast and diverse, ranging from finance, to recommender systems, to robotics. Below are some application cases that are being explored:

  • Automation of industry with RL

Robots using this type of learning in industry can be used for different actions. For example, AI agents can be used to cool data centres without human intervention.

Google has been one of the pioneers in implementing this machine learning method. To save large amounts of energy, Google uses RL to control air conditioning flows to its data centres in order to cool its servers.

Another interesting use case is supervised time series models for predicting future sales. Thus, when working with an RL agent, the decision to buy or sell can be made in investment banking. The RL model is evaluated using market benchmarks to ensure its performance. IBM, for example, has a platform for financial trading that calculates the reward based on the profit or loss of each transaction using RL.

  • RL in NLP

This technology is also very useful in the generation of answers, text readings and translations. Its operation in this area consists of selecting important parts of a text and, through an RNN, generating responses to the key words in the text.

Therefore, it allows the generation of conversations, obtaining rewards through words in combination with chatbots. Training in this method is carried out between two virtual agents using reward techniques by detecting consistency and compliance with rules, as well as appropriate responses.

  • Enhancing applications with RL

In this area, Facebook has developed an open source RL platform, known as Horizon, to optimise large-scale production systems. Horizon enables improvements such as customisation of suggestions and improved streaming.

Horizon is also able to work in simulated environments, distributed platforms and production systems, so that the use of RL in different applications can improve user tracking and thus optimise CX.

  • RL in Video Games

Video games are ideal for RL, as they include different simulation environments and control options. Generally, the method of operation of video games is to present a problem and force the player to solve it through complex tasks, obtaining scores or rewards in return. RL learns by playing against itself to enhance the user experience.

  • Robotic Manipulation

For assembly lines, for example, making use of RL allows a robot’s object grasping skills to be reinforced, so that a model is first trained offline and then deployed, correcting faults until adequate performance of the real robot is achieved. This approach is known as QT-Opt, designed for robot grasping.

Amazon in Reinforcement Learning

Amazon has developed the SageMaker Reinforcement Learning (RL) Kubeflow Components tool, a toolkit compatible with the company’s AWS RoboMaker service for orchestrating robotic workflows.

This large company was faced with the need to create a framework to efficiently train, synchronise and deploy RL models in the face of its ML boom. SageMaker and RoboMaker provide this framework for the development of robots and new algorithms that drive AI.

The SageMaker add-on is designed to manage robotic workloads faster, creating end-to-end solutions without having to rebuild them each time a particular model needs to be trained. In this sense, RL is ideally suited to help develop solutions to the difficulties and problems that are increasingly accumulating in the field of robotics.

Woodside is one of the companies that has used RoboMaker with SageMaker operators to train their robots using RL models to manage their most dangerous and repetitive tasks.

They used RL via RoboMaker and SageMaker for a robotic platform, whose function is to perform a bomb disposal procedure. This procedure requires manual turns of different valves in a certain order. To carry out the development, joint states and camera views have been used to define the optimal movements to be performed by the robot.


The use of RL presents significant challenges in the areas of environment simulation, the choice of the right algorithm and parameter tuning. Regarding the simulation of the environment, RL models must interact with the environment, but in cases such as energy optimisation or applications for autonomous cars and robotics, their design is complex. Thus, investment must be made in taking care of the details of the environment creation to train the algorithms correctly.

In addition, choosing the appropriate algorithm is critical given the wide variety of RL models, because they have several hyperparameters and each of them has a different approach. The metrics required for the performance of the algorithm must be evaluated.

Finally, if the environment is not well defined and taken care of, the algorithm can get stuck at a point, known as the exploration-exploitation dilemma. With each training the algorithm learns more about its environment.


Reinforcement Learning is currently an area of research that is gradually making significant progress within Machine Learning to optimise different fields and deploy its everyday use.

This type of machine learning focuses on complex problems through a trial and error approach. Undoubtedly, ML can be applied in different fields, from finance to recommender systems to video games or robotics.

However, it should also be borne in mind that it is a method that requires simulation training sessions in order to receive real rewards in the future. In any case, reinforcement learning is a machine learning method that allows increasingly complex problems to be solved and a wide variety of processes to be controlled.

In short, the aim is for AI to be able to solve problems autonomously without prior instructions from humans. This method is proving to be faster and more efficient, and is expected to achieve better results than those so far achieved by conventional Machine Learning.

view all