Automatic Construction of Discourse Corpora for Dialogue Translation

Wang, Longyue; Zhang, Xiaojun; Tu, Zhaopeng; Liu, Qun; Way, Andy

Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/23457

Full metadata record

DC Field	Value	Language
dc.contributor.author	Wang, Longyue	en_UK
dc.contributor.author	Zhang, Xiaojun	en_UK
dc.contributor.author	Tu, Zhaopeng	en_UK
dc.contributor.author	Liu, Qun	en_UK
dc.contributor.author	Way, Andy	en_UK
dc.contributor.editor	Calzolari, N	en_UK
dc.contributor.editor	Choukri, K	en_UK
dc.contributor.editor	Declerck T, T	en_UK
dc.contributor.editor	Goggi, S	en_UK
dc.contributor.editor	Grobelnik, M	en_UK
dc.contributor.editor	Maegaard, B	en_UK
dc.contributor.editor	Mariani, J	en_UK
dc.contributor.editor	Mazo, H	en_UK
dc.contributor.editor	Moreno, A	en_UK
dc.contributor.editor	Odijk, J	en_UK
dc.contributor.editor	Piperidis, S	en_UK
dc.date.accessioned	2016-12-06T01:35:58Z	-
dc.date.available	2016-12-06T01:35:58Z	-
dc.date.issued	2016-05-13	en_UK
dc.identifier.uri	http://hdl.handle.net/1893/23457	-
dc.description.abstract	In this paper, a novel approach is proposed to automatically construct parallel discourse corpus for dialogue machine translation. Firstly, the parallel subtitle data and its corresponding monolingual movie script data are crawled and collected from Internet. Then tags such as speaker and discourse boundary from the script data are projected to its subtitle data via an information retrieval approach in order to map monolingual discourse to bilingual texts. We not only evaluate the mapping results, but also integrate speaker information into the translation. Experiments show our proposed method can achieve 81.79% and 98.64% accuracy on speaker and dialogue boundary annotation, and speaker-based language model adaptation can obtain around 0.5 BLEU points improvement in translation qualities. Finally, we publicly release around 100K parallel discourse data with manual speaker and dialogue boundary annotation.	en_UK
dc.language.iso	en	en_UK
dc.publisher	European Language Resources Association	en_UK
dc.relation	Wang L, Zhang X, Tu Z, Liu Q & Way A (2016) Automatic Construction of Discourse Corpora for Dialogue Translation. In: Calzolari N, Choukri K, Declerck T T, Goggi S, Grobelnik M, Maegaard B, Mariani J, Mazo H, Moreno A, Odijk J & Piperidis S (eds.) LREC 2016, Tenth International Conference on Language Resources and Evaluation Proceedings. LREC 2016, Tenth International Conference on Language Resources and Evaluation, Portorož, Slovenia, 23.05.2016-28.05.2016. Paris: European Language Resources Association, pp. 2748-2754. http://www.lrec-conf.org/proceedings/lrec2016/pdf/790_Paper.pdf	en_UK
dc.rights	The LREC 2016 Proceedings are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License	en_UK
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	en_UK
dc.subject	Discourse Corpus	en_UK
dc.subject	Dialogue	en_UK
dc.subject	Machine Translation	en_UK
dc.subject	Information Retrieval	en_UK
dc.subject	Movie Script	en_UK
dc.subject	Movie Subtitle	en_UK
dc.title	Automatic Construction of Discourse Corpora for Dialogue Translation	en_UK
dc.type	Conference Paper	en_UK
dc.citation.spage	2748	en_UK
dc.citation.epage	2754	en_UK
dc.citation.publicationstatus	Published	en_UK
dc.citation.peerreviewed	Refereed	en_UK
dc.type.status	VoR - Version of Record	en_UK
dc.identifier.url	http://www.lrec-conf.org/proceedings/lrec2016/pdf/790_Paper.pdf	en_UK
dc.author.email	xiaojun.zhang@stir.ac.uk	en_UK
dc.citation.btitle	LREC 2016, Tenth International Conference on Language Resources and Evaluation Proceedings	en_UK
dc.citation.conferencedates	2016-05-23 - 2016-05-28	en_UK
dc.citation.conferencelocation	Portorož, Slovenia	en_UK
dc.citation.conferencename	LREC 2016, Tenth International Conference on Language Resources and Evaluation	en_UK
dc.citation.date	13/05/2016	en_UK
dc.citation.isbn	978-2-9517408-9-1	en_UK
dc.publisher.address	Paris	en_UK
dc.contributor.affiliation	Dublin City University	en_UK
dc.contributor.affiliation	English Studies	en_UK
dc.contributor.affiliation	Huawei Technologies (HK)	en_UK
dc.contributor.affiliation	ADAPT Centre	en_UK
dc.contributor.affiliation	ADAPT Centre	en_UK
dc.identifier.wtid	885227	en_UK
dc.contributor.orcid	0000-0003-3514-1981	en_UK
dc.date.accepted	2016-02-01	en_UK
dcterms.dateAccepted	2016-02-01	en_UK
dc.date.filedepositdate	2016-06-26	en_UK
rioxxterms.apc	not charged	en_UK
rioxxterms.type	Conference Paper/Proceeding/Abstract	en_UK
rioxxterms.version	VoR	en_UK
local.rioxx.author	Wang, Longyue\|	en_UK
local.rioxx.author	Zhang, Xiaojun\|0000-0003-3514-1981	en_UK
local.rioxx.author	Tu, Zhaopeng\|	en_UK
local.rioxx.author	Liu, Qun\|	en_UK
local.rioxx.author	Way, Andy\|	en_UK
local.rioxx.project	Internal Project\|University of Stirling\|https://isni.org/isni/0000000122484331	en_UK
local.rioxx.contributor	Calzolari, N\|	en_UK
local.rioxx.contributor	Choukri, K\|	en_UK
local.rioxx.contributor	Declerck T, T\|	en_UK
local.rioxx.contributor	Goggi, S\|	en_UK
local.rioxx.contributor	Grobelnik, M\|	en_UK
local.rioxx.contributor	Maegaard, B\|	en_UK
local.rioxx.contributor	Mariani, J\|	en_UK
local.rioxx.contributor	Mazo, H\|	en_UK
local.rioxx.contributor	Moreno, A\|	en_UK
local.rioxx.contributor	Odijk, J\|	en_UK
local.rioxx.contributor	Piperidis, S\|	en_UK
local.rioxx.freetoreaddate	2016-06-30	en_UK
local.rioxx.licence	http://creativecommons.org/licenses/by-nc/4.0/\|2016-06-30\|	en_UK
local.rioxx.filename	790_Paper.pdf	en_UK
local.rioxx.filecount	1	en_UK
local.rioxx.source	978-2-9517408-9-1	en_UK
Appears in Collections:	Literature and Languages Conference Papers and Proceedings

Files in This Item:

File	Description	Size	Format
790_Paper.pdf	Fulltext - Published Version	668.34 kB	Adobe PDF	View/Open

This item is protected by original copyright

View License

Show simple item record

A file in this item is licensed under a Creative Commons License

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.

STORRE

STORRE: Stirling Online Research Repository