Automatic Construction of Parallel Dialogue Corpora with Rich Information

Zhang, Xiaojun; Wang, Longyue; Liu, Qun; Way, Andy

Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/25851

Full metadata record

DC Field	Value	Language
dc.contributor.author	Zhang, Xiaojun	en_UK
dc.contributor.author	Wang, Longyue	en_UK
dc.contributor.author	Liu, Qun	en_UK
dc.contributor.author	Way, Andy	en_UK
dc.contributor.editor	Huang, C-R	en_UK
dc.date.accessioned	2018-05-12T09:10:35Z	-
dc.date.available	2018-05-12T09:10:35Z	-
dc.date.issued	2021	en_UK
dc.identifier.uri	http://hdl.handle.net/1893/25851	-
dc.description.abstract	Due to the lack of ideal resources, few researchers have investigated how to improve the machine translation (MT) of conversational material by exploiting their internal structure. In this article, we propose a novel strategy to automatically construct parallel dialogue corpus by bridging two kinds of resources: movie subtitle and movie script. First of all, we crawl both parallel subtitles and their corresponding monolingual scripts from the Internet. After sentence alignment, we can then project all useful information from the script side to its corresponding subtitle side. Finally, we automatically build a Chinese--English dialogue corpus, which contains bilingual subtitle utterances, speaker name and action, scene description and boundary, as well as script sentence. In order to demonstrate the usefulness of our data, we explore to use speaker name tags to improve the translation performance. Experiments show that our approach can achieve 81.79\% accuracy on speaker name annotation, and speaker-based model adaptation can obtain around 0.5 BLEU point improvement in translation qualities. We believe that our resources can benefit various tasks such as dialog system, image/movie description as well as MT.	en_UK
dc.language.iso	en	en_UK
dc.publisher	Springer	en_UK
dc.relation	Zhang X, Wang L, Liu Q & Way A (2021) Automatic Construction of Parallel Dialogue Corpora with Rich Information. In: Huang C (ed.) Forthcoming Volume. Text, Speech and Language Technology. Cham, Switzerland: Springer.	en_UK
dc.relation.ispartofseries	Text, Speech and Language Technology	en_UK
dc.rights	The publisher does not allow this work to be made publicly available in this Repository. Please use the Request a Copy feature at the foot of the Repository record to request a copy directly from the author. You can only request a copy if you wish to use this work for your own research or private study.	en_UK
dc.rights.uri	http://www.rioxx.net/licenses/under-embargo-all-rights-reserved	en_UK
dc.subject	Dialogue	en_UK
dc.subject	Machine Translation	en_UK
dc.subject	Parallel Corpus	en_UK
dc.subject	Movie Script	en_UK
dc.subject	Movie Subtitle	en_UK
dc.subject	Rich Information	en_UK
dc.title	Automatic Construction of Parallel Dialogue Corpora with Rich Information	en_UK
dc.type	Part of book or chapter of book	en_UK
dc.rights.embargodate	3003-12-01	en_UK
dc.rights.embargoreason	[TSLT_Springer_Book_camera_ready_v3.pdf] The publisher does not allow this work to be made publicly available in this Repository therefore there is an embargo on the full text of the work.	en_UK
dc.citation.issn	1386-291X	en_UK
dc.type.status	AM - Accepted Manuscript	en_UK
dc.author.email	xiaojun.zhang@stir.ac.uk	en_UK
dc.citation.btitle	Forthcoming Volume	en_UK
dc.publisher.address	Cham, Switzerland	en_UK
dc.description.notes	Output Status: Forthcoming	en_UK
dc.contributor.affiliation	English Studies	en_UK
dc.contributor.affiliation	Dublin City University	en_UK
dc.contributor.affiliation	Dublin City University	en_UK
dc.contributor.affiliation	Dublin City University	en_UK
dc.identifier.wtid	883712	en_UK
dc.contributor.orcid	0000-0003-3514-1981	en_UK
dcterms.dateAccepted	2021-12-31	en_UK
dc.date.filedepositdate	2017-09-08	en_UK
rioxxterms.type	Book chapter	en_UK
rioxxterms.version	AM	en_UK
local.rioxx.author	Zhang, Xiaojun\|0000-0003-3514-1981	en_UK
local.rioxx.author	Wang, Longyue\|	en_UK
local.rioxx.author	Liu, Qun\|	en_UK
local.rioxx.author	Way, Andy\|	en_UK
local.rioxx.project	Internal Project\|University of Stirling\|https://isni.org/isni/0000000122484331	en_UK
local.rioxx.contributor	Huang, C-R\|	en_UK
local.rioxx.freetoreaddate	3003-12-01	en_UK
local.rioxx.licence	http://www.rioxx.net/licenses/under-embargo-all-rights-reserved\|\|	en_UK
local.rioxx.filename	TSLT_Springer_Book_camera_ready_v3.pdf	en_UK
local.rioxx.filecount	1	en_UK
local.rioxx.source	1386-291X	en_UK
Appears in Collections:	Literature and Languages Book Chapters and Sections

Files in This Item:

File	Description	Size	Format
TSLT_Springer_Book_camera_ready_v3.pdf	Fulltext - Accepted Version	887.34 kB	Adobe PDF	Under Embargo until 3003-12-01 Request a copy

This item is protected by original copyright

View License

Show simple item record

Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.

STORRE

STORRE: Stirling Online Research Repository