Please use this identifier to cite or link to this item: http://hdl.handle.net/1893/27711
Full metadata record
DC FieldValueLanguage
dc.contributor.authorCurry, Gordon Ben_UK
dc.contributor.authorConnor, Richard C Hen_UK
dc.date.accessioned2018-09-05T13:55:22Z-
dc.date.available2018-09-05T13:55:22Z-
dc.date.issued2008-02-01en_UK
dc.identifier.urihttp://hdl.handle.net/1893/27711-
dc.description.abstractMany valuable earth science data are not available in a digital format. Manual entry of such information into databases is time consuming, unrewarding, and prone to introducing errors. Taxonomic descriptions of fossils are a good example of valuable data that are overwhelming and available only in printed volumes and journals, some of which are increasingly rare and inaccessible. The highly structured nature of taxonomic procedures and nomenclature means that many previously published data remain equally valid to the present day, and contain information that is currently not available on the World Wide Web; these data would be of great use to a wide variety of scientists and other end users in government, industry, academia and the general public. This paper describes an XML (extensible markup language) parsing technique that allows taxonomic descriptions to be fully digitized much more rapidly than would be possible by manual entry of the data into a database. The technique exploits the high degree of structure in taxonomic descriptions, which are written in a standardized format, to automate the processing of tagging separate sections of the text. Once tagged using XML, the data can be subjected to complex searches using queries written in any of the XML query standards. The XML-tagged data can potentially be imported into existing databases, in effect removing the necessity to manually enter the information, and hence overcoming the main bottleneck in generating digital data from printed material. Individual parsers can be tailored precisely to the nature of the text being analyzed, and once the underlying concepts and procedures are understood, those interested in acquiring and using digital data will be able to generate XML parsers dedicated to text with different styles of standardized formatting.en_UK
dc.language.isoenen_UK
dc.publisherGeological Society of Americaen_UK
dc.relationCurry GB & Connor RCH (2008) Automated extraction of data from text using an XML parser: An earth science example using fossil descriptions. Geosphere, 4 (1), pp. 159-169. https://doi.org/10.1130/GES00140.1en_UK
dc.rightsThe publisher does not allow this work to be made publicly available in this Repository. Please use the Request a Copy feature at the foot of the Repository record to request a copy directly from the author. You can only request a copy if you wish to use this work for your own research or private study.en_UK
dc.rights.urihttp://www.rioxx.net/licenses/under-embargo-all-rights-reserveden_UK
dc.subjectGeoinformaticsen_UK
dc.subjectdata acquisitionen_UK
dc.subjectXMLen_UK
dc.subjectparsingen_UK
dc.subjecttaxonomyen_UK
dc.subjectdatabasesen_UK
dc.titleAutomated extraction of data from text using an XML parser: An earth science example using fossil descriptionsen_UK
dc.typeJournal Articleen_UK
dc.rights.embargodate2999-12-31en_UK
dc.rights.embargoreason[Curry Connor 2008.pdf] The publisher does not allow this work to be made publicly available in this Repository therefore there is an embargo on the full text of the work.en_UK
dc.identifier.doi10.1130/GES00140.1en_UK
dc.citation.jtitleGeosphereen_UK
dc.citation.issn1553-040Xen_UK
dc.citation.volume4en_UK
dc.citation.issue1en_UK
dc.citation.spage159en_UK
dc.citation.epage169en_UK
dc.citation.publicationstatusPublisheden_UK
dc.citation.peerreviewedRefereeden_UK
dc.type.statusVoR - Version of Recorden_UK
dc.contributor.funderEngineering and Physical Sciences Research Councilen_UK
dc.contributor.funderBiotechnology and Biological Sciences Research Councilen_UK
dc.author.emailrichard.connor@stir.ac.uken_UK
dc.citation.date01/02/2008en_UK
dc.contributor.affiliationUniversity of Glasgowen_UK
dc.contributor.affiliationUniversity of Strathclydeen_UK
dc.identifier.isiWOS:10.1130/GES00140.1en_UK
dc.identifier.scopusid2-s2.0-41949097911en_UK
dc.identifier.wtid956111en_UK
dc.contributor.orcid0000-0003-4734-8103en_UK
dc.date.accepted2007-09-11en_UK
dcterms.dateAccepted2007-09-11en_UK
dc.date.filedepositdate2018-08-16en_UK
rioxxterms.apcnot requireden_UK
rioxxterms.typeJournal Article/Reviewen_UK
rioxxterms.versionVoRen_UK
local.rioxx.authorCurry, Gordon B|en_UK
local.rioxx.authorConnor, Richard C H|0000-0003-4734-8103en_UK
local.rioxx.projectProject ID unknown|Biotechnology and Biological Sciences Research Council|http://dx.doi.org/10.13039/501100000268en_UK
local.rioxx.projectProject ID unknown|Engineering and Physical Sciences Research Council|http://dx.doi.org/10.13039/501100000266en_UK
local.rioxx.freetoreaddate2258-01-02en_UK
local.rioxx.licencehttp://www.rioxx.net/licenses/under-embargo-all-rights-reserved||en_UK
local.rioxx.filenameCurry Connor 2008.pdfen_UK
local.rioxx.filecount1en_UK
local.rioxx.source1553-040Xen_UK
Appears in Collections:Computing Science and Mathematics Journal Articles

Files in This Item:
File Description SizeFormat 
Curry Connor 2008.pdfFulltext - Published Version1.24 MBAdobe PDFUnder Permanent Embargo    Request a copy


This item is protected by original copyright



Items in the Repository are protected by copyright, with all rights reserved, unless otherwise indicated.

The metadata of the records in the Repository are available under the CC0 public domain dedication: No Rights Reserved https://creativecommons.org/publicdomain/zero/1.0/

If you believe that any material held in STORRE infringes copyright, please contact library@stir.ac.uk providing details and we will remove the Work from public display in STORRE and investigate your claim.