This is the third blog from out series, you can always check other posts in series to understand more about our work.
As we mentioned above, in the nature of Reinforcement Learning(RL), the agents try to maximize the reward when interacting with the environment. This is similar to the problem of Recommendation System that the Recommendation System algorithm tries to recommend the best items to the users and maximize the satisfaction of users. In a big picture, we can consider the Recommendation System system as an agent, and anything outside the agent as environment. But in the real-life, because the data is huge result in the enormous faction and state spaces, some RL algorithms cannot carry on. Beside some appropriate RL algorithms for Recommendation System, the trend of Deep RL algorithm is becoming emerging and considered strictly in some relevant aspects. The table below from the survey shows us the algorithms used in many research papers.
In the bar chart below, RL have been applied in Recommendation System from 1996 till 2020. However, before 2018, there are not much research papers about this topic in those year, which only stay less than 5. This is maybe because the RL have been developed and tested so just small number of researches pay attention to this topic. Besides, we also lacked of huge datasets, online testing environment and unified/standard evaluation method. From 2018 till now, there is a burst in researching that topic with many algorithms from both RL and Deep Reinforcement Learning(DRL). From the result of those research papers, we have some dominant algorithms with frequently used datasets to train and metrics to evaluate Recommendation System. Those will be explained in next part.
From the table below, the reinforcement learning algorithms has been applied in RS in the early days. Beginning with temporal difference method (Q-Learning/SARSA) and dynamic programming (Policy iteration), researchers have less change in method they used to conduct the research. From it till now, although they are developing many new algorithms, temporal difference with Q-Learning and SARSA still is a popular method due to its advantages presented below. Moreover, more datasets — especially real-life dataset such as Amazon and MovieLens — are trustworthy to used due to their structured data and metadata providing more information for each items/users. However, most evaluation procedures are offline by building a simulation environment and using some metrics that have changes through each research papers. Then we have a list of popular metrics:
Each method has its own algorithms with those advantage and disadvantage.
Coming later than the original RS algorithms, but Deep Reinforcement Learning Methods has proved to the researchers about their feasibility and efficiency, but still having small drawbacks for each method. In the table below, in 6 years, there are more research has been conducted which dominant methods are Deep Q-Learning and Actor — Critic, but the trending in 2020 till now are combining two or more methods whose advantages can support to solve the drawbacks of other methods. The datasets also become more diverse which tend to use some popular datasets in entertainment and e-commerce. Although evaluation procedures still is in offline, but some researches have built up a online evaluation having good initial results. The metrics to evaluate the RS with DRL also still the same as ones of RL above such as precision, recall, NDCG, F1, hit-ratio… We have some short description about those methods.
Single Items to multiple items recommendation: RL algorithms have been developed to select one action from many different actions around. But in the RS field, the system has tendency to generate a list of items. In many researches, the vast number of reviewed research paper consider the problem of single item recommendation. In recent years, the researching trending is about recommending a list of items. However, the researchers have to face with the problem of huge combinatorics action space, because when the action space is larger, there are more demand on exploration, generalization and optimization.
Architecture of algorithm: At the moment, most of existing RS using DRL method have been developed and tested in many environments. However, those methods are designed based on physics processes, not based on sophisticate and dynamic nature of users. So that we should have more creative method to deal with that problem. Besides having some RL algorithms popular among RS, the combination of Deep Learning and RL (compound) is becoming a valuable research direction for the future trending algorithm.
Explainable recommendation: The ability of an RS not only to provide a recommendation for user but also to explain the reason why the system recommend those items. Explanation about recommendation could improve user experience, boost the system’s trust and help RS make better/accurate decisions. Over more, this criterion help developer to track the operation of RS, then find and fix the bug or have changes in algorithm structure in need. However, this criterion still do not receive enough attentions in spite of its potential contribution in enhancing quality of RL. So that, more researchers should focus on this aspect in future.
Improving evaluation method for RL in Recommendation System: Evaluation for RL in Recommendation System should be improved. In the RL process, the agent interacts with the environment to learn what to do, which have similarity with online evaluation that RS generates the recommendations and receives the user feedback in real-time. However, most of the reviewed research papers just use offline evaluation because conducting online evaluation costs a huge price and deploying RL in Recommendation System is very risky in real businesses. On the other hand, many researchers develop the simulations for offline evaluation with the available datasets such as MovieLens, Amazon, any real business database, … But most of user models in those simulations are too simple and cannot reflect the behaviors of real users. Although there are some methods to simulate user’s behaviors, we have to develop a powerful simulator and an evaluation framework for RL in Recommender System in futures.
Reference medium link: Reinforcement Learning Based Recommendation System - Part 3
NeurondAI is a transformation business. Contact us at:Website: https://www.neurond.com