Half day tutorial: Free your metadata : a practical approach towards metadata cleaning and vocabulary reconciliation
Seth van Hooland, Max De Wilde (Université Libre de Bruxelles, Belgium), Ruben Verborgh (Multimedia Lab, Ghent University, Belgium)
The early-to-mid 2000s economic downturn in the US and Europe forced Digital Humanities projects to adopt a more pragmatic stance towards metadata creation and to deliver short-term results towards grant providers. It is precisely in this context that the concept of Linked and Open Data (LOD) has gained momentum. In this tutorial, we want to focus on reconciliation, the process in which we map domain specific vocabulary to another (often more commonly used) vocabulary that is part of the Semantic Web in order to annex the metadata to the Linked Data Cloud. We believe that the integration of heterogeneous collections can be managed by using subject vocabulary for cross linking between collections, since major classifications and thesauri (e.g. LCSH, DDC, RAMEAU, etc.) have been made available following Linked Data Principles.
Re-using these established terms for indexing cultural heritage resources represents a big potential of Linked Data for Digital Humanities projects, but there is a common belief that the application of LOD publishing still requires expert knowledge of Semantic Web technologies. This tutorial will therefore demonstrate how Semantic Web novices can start experimenting on their own with non-expert software such as Google Refine.
Participants of the tutorial will be asked to bring an export (or a subset) of metadata from their own projects or organizations and to pre-install Google Refine on their laptop. All necessary operations to reconcile metadata with controlled vocabularies which are already a part of the Linked Data cloud will be presented in detail, after which participants will be given time to perform these actions on their own metadata, under assistance of the tutorial organizers. Previous tutorials have mainly relied on the use of the Library of Congres Subject Headings (LCSH), but for the DH2012 conference we will test out beforehand SPARQL endpoints of controlled vocabularies in German (available for example on http://wiss-ki.eu/authorities/gnd/) in order to make sure that local participants will be able to experiment with metadata in German.
This tutorial proposal is a part of the Free your Metadata research project (www.freeyourmetadata.org). The website offers a variety of video’s, screencasts and documentation on how to use Google Refine to clean and reconcile metadata with controlled vocabularies already connected to the Linked Data cloud. The website also offers an overview of previous presentations and workshops.