Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/27854
Appears in Collections:Computing Science and Mathematics Conference Papers and Proceedings
Author(s): Beck, Tilman
Böschen, Falk
Scherp, Ansgar
Title: What to Read Next? Challenges and Preliminary Results in Selecting Representative Documents
Editor(s): Elloumi, M
Granitzer, M
Hameurlain, A
Seifert, C
Stein, B
Tjoa, AM
Wagner, R
Citation: Beck T, Böschen F & Scherp A (2018) What to Read Next? Challenges and Preliminary Results in Selecting Representative Documents. In: Elloumi M, Granitzer M, Hameurlain A, Seifert C, Stein B, Tjoa A & Wagner R (eds.) Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, 903. 29th International Conference on Database and Expert Systems Applications, DEXA 2018, Regensburg, Germany, 03.09.2018-06.09.2018. Cham, Switzerland: Springer International Publishing, pp. 230-242. https://doi.org/10.1007/978-3-319-99133-7_19
Issue Date: 31-Dec-2018
Date Deposited: 27-Sep-2018
Series/Report no.: Communications in Computer and Information Science, 903
Conference Name: 29th International Conference on Database and Expert Systems Applications, DEXA 2018
Conference Dates: 2018-09-03 - 2018-09-06
Conference Location: Regensburg, Germany
Abstract: The vast amount of scientific literature poses a challenge when one is trying to understand a previously unknown topic. Selecting a representative subset of documents that covers most of the desired content can solve this challenge by presenting the user a small subset of documents. We build on existing research on representative subset extraction and apply it in an information retrieval setting. Our document selection process consists of three steps: computation of the document representations, clustering, and selection of documents. We implement and compare two different document representations, two different clustering algorithms, and three different selection methods using a coverage and a redundancy metric. We execute our 36 experiments on two datasets, with 10 sample queries each, from different domains. The results show that there is no clear favorite and that we need to ask the question whether coverage and redundancy are sufficient for evaluating representative subsets.
Status: AM - Accepted Manuscript
Rights: This is a post-peer-review, pre-copyedit version of a paper published in Elloumi M. et al. (eds) Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, vol 903. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-99133-7_19

Files in This Item:
File Description SizeFormat 
W43-BeckEtAl-Challenges and Preliminary Results in Selecting Representative Documents.pdfFulltext - Accepted Version725.74 kBAdobe PDFView/Open



This item is protected by original copyright



Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.