Publications Repository - Gdańsk University of Technology

Page settings

polski
Publications Repository
Gdańsk University of Technology

Treść strony

Extraction of information from born-digital PDF documents for reproducible research

Born-digital PDF electronic documents might reasonably be expected to preserve useful data units of their source originals that suffice to produce executable papers for reproducible research. Unfortunately, developers of authoring tools may adopt arbitrary PDF generation strategies, producing a plethora of internal data representations. Such common information units as text paragraphs, tables, function graphs and flow diagrams, may require numerous heuristics to handle properly each vendor specific PDF file content. We propose a generic Reverse MVC interpretation pattern that enables to cope with that arbitrariness in a systematic way. It constitutes a component of a larger framework we have been developing for making executable papers out of PDF documents without injecting in the PDF file any extra data or code

Authors

Additional information

DOI
Digital Object Identifier link open in new tab 10.12720/joams.4.3.238-244
Category
Publikacja w czasopiśmie
Type
publikacja w in. zagranicznym czasopiśmie naukowym (tylko język obcy)
Language
angielski
Publication year
2016

Source: MOSTWiedzy.pl - publication "Extraction of information from born-digital PDF documents for reproducible research" link open in new tab

Portal MOST Wiedzy link open in new tab