Guided Policy Search for Sequential Multitask Learning

Xiong, Fangzhou; Sun, Biao; Yang, Xu; Qiao, Hong; Huang, Kaizhu; Hussain, Amir; Liu, Zhiyong

doi:10.1109/TSMC.2018.2800040

Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/27076

Full metadata record

DC Field	Value	Language
dc.contributor.author	Xiong, Fangzhou	en_UK
dc.contributor.author	Sun, Biao	en_UK
dc.contributor.author	Yang, Xu	en_UK
dc.contributor.author	Qiao, Hong	en_UK
dc.contributor.author	Huang, Kaizhu	en_UK
dc.contributor.author	Hussain, Amir	en_UK
dc.contributor.author	Liu, Zhiyong	en_UK
dc.date.accessioned	2018-04-18T23:15:30Z	-
dc.date.available	2018-04-18T23:15:30Z	-
dc.date.issued	2019-01-01	en_UK
dc.identifier.uri	http://hdl.handle.net/1893/27076	-
dc.description.abstract	Policy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multitask learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, and elastic weight consolidation. Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multitask learning-GPS, is able to operate in sequential multitask learning settings and ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially arriving training samples, delivering comparable performance to the traditional, and batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.	en_UK
dc.language.iso	en	en_UK
dc.publisher	IEEE	en_UK
dc.relation	Xiong F, Sun B, Yang X, Qiao H, Huang K, Hussain A & Liu Z (2019) Guided Policy Search for Sequential Multitask Learning. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49 (1), pp. 216-226. https://doi.org/10.1109/TSMC.2018.2800040	en_UK
dc.rights	© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_UK
dc.subject	elastic weight consolidation (EWC)	en_UK
dc.subject	guided policy search (GPS)	en_UK
dc.subject	reinforcement learning (RL)	en_UK
dc.subject	sequential multitask learning	en_UK
dc.title	Guided Policy Search for Sequential Multitask Learning	en_UK
dc.type	Journal Article	en_UK
dc.rights.embargodate	2018-04-18	en_UK
dc.identifier.doi	10.1109/TSMC.2018.2800040	en_UK
dc.citation.jtitle	IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans	en_UK
dc.citation.issn	1083-4427	en_UK
dc.citation.volume	49	en_UK
dc.citation.issue	1	en_UK
dc.citation.spage	216	en_UK
dc.citation.epage	226	en_UK
dc.citation.publicationstatus	Published	en_UK
dc.citation.peerreviewed	Refereed	en_UK
dc.type.status	VoR - Version of Record	en_UK
dc.type.status	AM - Accepted Manuscript	en_UK
dc.contributor.funder	Engineering and Physical Sciences Research Council	en_UK
dc.citation.date	19/02/2018	en_UK
dc.contributor.affiliation	Chinese Academy of Sciences	en_UK
dc.contributor.affiliation	University of Science and Technology Beijing	en_UK
dc.contributor.affiliation	Chinese Academy of Sciences	en_UK
dc.contributor.affiliation	Chinese Academy of Sciences	en_UK
dc.contributor.affiliation	Xi’an Jiaotong University	en_UK
dc.contributor.affiliation	Computing Science	en_UK
dc.contributor.affiliation	Chinese Academy of Sciences	en_UK
dc.identifier.scopusid	2-s2.0-85042198472	en_UK
dc.identifier.wtid	879200	en_UK
dc.contributor.orcid	0000-0002-8080-082X	en_UK
dc.date.accepted	2018-01-11	en_UK
dcterms.dateAccepted	2018-01-11	en_UK
dc.date.filedepositdate	2018-04-18	en_UK
dc.relation.funderproject	Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices	en_UK
dc.relation.funderref	EP/M026981/1	en_UK
rioxxterms.apc	not required	en_UK
rioxxterms.type	Journal Article/Review	en_UK
rioxxterms.version	AM	en_UK
local.rioxx.author	Xiong, Fangzhou\|	en_UK
local.rioxx.author	Sun, Biao\|	en_UK
local.rioxx.author	Yang, Xu\|	en_UK
local.rioxx.author	Qiao, Hong\|	en_UK
local.rioxx.author	Huang, Kaizhu\|	en_UK
local.rioxx.author	Hussain, Amir\|0000-0002-8080-082X	en_UK
local.rioxx.author	Liu, Zhiyong\|	en_UK
local.rioxx.project	EP/M026981/1\|Engineering and Physical Sciences Research Council\|http://dx.doi.org/10.13039/501100000266	en_UK
local.rioxx.freetoreaddate	2018-04-18	en_UK
local.rioxx.licence	http://www.rioxx.net/licenses/all-rights-reserved\|2018-04-18\|	en_UK
local.rioxx.filename	Guided Policy Search for Sequential Multi-Task Learning_clean.pdf	en_UK
local.rioxx.filecount	2	en_UK
local.rioxx.source	1083-4427	en_UK
Appears in Collections:	Computing Science and Mathematics Journal Articles

Files in This Item:

File	Description	Size	Format
Xiong et al-IEEE-2019.pdf	Fulltext - Published Version	1.05 MB	Adobe PDF	View/Open
Guided Policy Search for Sequential Multi-Task Learning_clean.pdf	Fulltext - Accepted Version	536.2 kB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show simple item record

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.

STORRE

STORRE: Stirling Online Research Repository