|Appears in Collections:||Computing Science and Mathematics Journal Articles|
|Peer Review Status:||Refereed|
|Title:||Guided Policy Search for Sequential Multitask Learning|
|Keywords:||elastic weight consolidation (EWC)|
guided policy search (GPS)
reinforcement learning (RL)
sequential multitask learning
|Citation:||Xiong F, Sun B, Yang X, Qiao H, Huang K, Hussain A & Liu Z (2019) Guided Policy Search for Sequential Multitask Learning. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49 (1), pp. 216-226. https://doi.org/10.1109/TSMC.2018.2800040|
|Abstract:||Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multitask learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, and elastic weight consolidation. Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multitask learning-GPS, is able to operate in sequential multitask learning settings and ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially arriving training samples, delivering comparable performance to the traditional, and batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.|
|Rights:||© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.|
|Xiong et al-IEEE-2019.pdf||Fulltext - Published Version||1.05 MB||Adobe PDF||View/Open|
|Guided Policy Search for Sequential Multi-Task Learning_clean.pdf||Fulltext - Accepted Version||536.2 kB||Adobe PDF||View/Open|
This item is protected by original copyright
Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.
If you believe that any material held in STORRE infringes copyright, please contact firstname.lastname@example.org providing details and we will remove the Work from public display in STORRE and investigate your claim.