Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/27076
Full metadata record
DC FieldValueLanguage
dc.contributor.authorXiong, Fangzhouen_UK
dc.contributor.authorSun, Biaoen_UK
dc.contributor.authorYang, Xuen_UK
dc.contributor.authorQiao, Hongen_UK
dc.contributor.authorHuang, Kaizhuen_UK
dc.contributor.authorHussain, Amiren_UK
dc.contributor.authorLiu, Zhiyongen_UK
dc.date.accessioned2018-04-18T23:15:30Z-
dc.date.available2018-04-18T23:15:30Z-
dc.date.issued2019-01-01en_UK
dc.identifier.urihttp://hdl.handle.net/1893/27076-
dc.description.abstractPolicy search in reinforcement learning (RL) is a practical approach to interact directly with environments in parameter spaces, that often deal with dilemmas of local optima and real-time sample collection. A promising algorithm, known as guided policy search (GPS), is capable of handling the challenge of training samples using trajectory-centric methods. It can also provide asymptotic local convergence guarantees. However, in its current form, the GPS algorithm cannot operate in sequential multitask learning scenarios. This is due to its batch-style training requirement, where all training samples are collectively provided at the start of the learning process. The algorithm's adaptation is thus hindered for real-time applications, where training samples or tasks can arrive randomly. In this paper, the GPS approach is reformulated, by adapting a recently proposed, lifelong-learning method, and elastic weight consolidation. Specifically, Fisher information is incorporated to impart knowledge from previously learned tasks. The proposed algorithm, termed sequential multitask learning-GPS, is able to operate in sequential multitask learning settings and ensuring continuous policy learning, without catastrophic forgetting. Pendulum and robotic manipulation experiments demonstrate the new algorithms efficacy to learn control policies for handling sequentially arriving training samples, delivering comparable performance to the traditional, and batch-based GPS algorithm. In conclusion, the proposed algorithm is posited as a new benchmark for the real-time RL and robotics research community.en_UK
dc.language.isoenen_UK
dc.publisherIEEEen_UK
dc.relationXiong F, Sun B, Yang X, Qiao H, Huang K, Hussain A & Liu Z (2019) Guided Policy Search for Sequential Multitask Learning. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49 (1), pp. 216-226. https://doi.org/10.1109/TSMC.2018.2800040en_UK
dc.rights© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.en_UK
dc.subjectelastic weight consolidation (EWC)en_UK
dc.subjectguided policy search (GPS)en_UK
dc.subjectreinforcement learning (RL)en_UK
dc.subjectsequential multitask learningen_UK
dc.titleGuided Policy Search for Sequential Multitask Learningen_UK
dc.typeJournal Articleen_UK
dc.rights.embargodate2018-04-18en_UK
dc.identifier.doi10.1109/TSMC.2018.2800040en_UK
dc.citation.jtitleIEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humansen_UK
dc.citation.issn1083-4427en_UK
dc.citation.volume49en_UK
dc.citation.issue1en_UK
dc.citation.spage216en_UK
dc.citation.epage226en_UK
dc.citation.publicationstatusPublisheden_UK
dc.citation.peerreviewedRefereeden_UK
dc.type.statusVoR - Version of Recorden_UK
dc.type.statusAM - Accepted Manuscripten_UK
dc.contributor.funderEngineering and Physical Sciences Research Councilen_UK
dc.citation.date19/02/2018en_UK
dc.contributor.affiliationChinese Academy of Sciencesen_UK
dc.contributor.affiliationUniversity of Science and Technology Beijingen_UK
dc.contributor.affiliationChinese Academy of Sciencesen_UK
dc.contributor.affiliationChinese Academy of Sciencesen_UK
dc.contributor.affiliationXi’an Jiaotong Universityen_UK
dc.contributor.affiliationComputing Scienceen_UK
dc.contributor.affiliationChinese Academy of Sciencesen_UK
dc.identifier.scopusid2-s2.0-85042198472en_UK
dc.identifier.wtid879200en_UK
dc.contributor.orcid0000-0002-8080-082Xen_UK
dc.date.accepted2018-01-11en_UK
dcterms.dateAccepted2018-01-11en_UK
dc.date.filedepositdate2018-04-18en_UK
dc.relation.funderprojectTowards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devicesen_UK
dc.relation.funderrefEP/M026981/1en_UK
rioxxterms.apcnot requireden_UK
rioxxterms.typeJournal Article/Reviewen_UK
rioxxterms.versionAMen_UK
local.rioxx.authorXiong, Fangzhou|en_UK
local.rioxx.authorSun, Biao|en_UK
local.rioxx.authorYang, Xu|en_UK
local.rioxx.authorQiao, Hong|en_UK
local.rioxx.authorHuang, Kaizhu|en_UK
local.rioxx.authorHussain, Amir|0000-0002-8080-082Xen_UK
local.rioxx.authorLiu, Zhiyong|en_UK
local.rioxx.projectEP/M026981/1|Engineering and Physical Sciences Research Council|http://dx.doi.org/10.13039/501100000266en_UK
local.rioxx.freetoreaddate2018-04-18en_UK
local.rioxx.licencehttp://www.rioxx.net/licenses/all-rights-reserved|2018-04-18|en_UK
local.rioxx.filenameGuided Policy Search for Sequential Multi-Task Learning_clean.pdfen_UK
local.rioxx.filecount2en_UK
local.rioxx.source1083-4427en_UK
Appears in Collections:Computing Science and Mathematics Journal Articles

Files in This Item:
File Description SizeFormat 
Xiong et al-IEEE-2019.pdfFulltext - Published Version1.05 MBAdobe PDFView/Open
Guided Policy Search for Sequential Multi-Task Learning_clean.pdfFulltext - Accepted Version536.2 kBAdobe PDFView/Open


This item is protected by original copyright



Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.