Enhancing Exploration in Reinforcement Learning through Multi-Step Actions

Shrivastava, Anshumali2020-12-102020-12-102020-122020-12-03December 2Medini, Tharun. "Enhancing Exploration in Reinforcement Learning through Multi-Step Actions." (2020) Master’s Thesis, Rice University. <a href="https://hdl.handle.net/1911/109644">https://hdl.handle.net/1911/109644</a>.https://hdl.handle.net/1911/109644The paradigm of Reinforcement Learning (RL) has been plagued by slow and uncertain training owing to the poor exploration in existing techniques. This can be mainly attributed to the lack of training data beforehand. Further, querying a neural network after every step is a wasteful process as some states are conducive to multi-step actions. Since we train with data generated on-the-fly, it is hard to pre-identify certain action sequences that consistently yield great rewards. Prior research in RL has been focused on designing algorithms that can train multiple agents in parallel and accumulate information from these agents to train faster. Concurrently, research has also been done to dynamically identify action sequences that are suited for a specific input state. In this work, we provide insights into the necessity and training methods for RL with multi-step action sequences in conjunction with the main actions of an RL environment. We broadly discuss two approaches. First of them is A4C - Anticipatory Asynchronous Advantage Actor-Critic, a method that squeezes twice the gradients from the same number of episodes and thereby achieves higher scores and converges faster. The second one is an alternative to Imitation Learning that mitigates the need for having state-action pairs of expert. With as few as 20 action trajectories of expert, we can identify the most frequent action pairs and append to the novice's action space. We show the power of our approaches by consistently and significantly outperforming the state-of-the-art GPU-enabled-A3C (GA3C) on popular ATARI games.application/pdfengCopyright is held by the author, unless otherwise indicated. Permission to reuse, publish, or reproduce the work beyond the bounds of fair use or other exemptions to copyright law must be obtained from the copyright holder.Reinforcement LearningImitation LearningMachine LearningATARIDeepMindA3CGA3CActor CriticEnhancing Exploration in Reinforcement Learning through Multi-Step ActionsThesis2020-12-10