Estermann, Beat (2023). SWITCH Innovation Lab: “Image to Concept”: Final Project Report Chur: Fachhochschule Graubünden
|
Text
Estermann_et_al_2023_InnoLab Image to Concept - Final Report with Annexes - 20231006.pdf - Published Version Available under License Creative Commons: Attribution (CC-BY). Download (2MB) | Preview |
By means of its “Research Data Connectome”, SWITCH, the Swiss national infrastructure service provider for higher education and research, seeks to connect open research using linked open data technologies. The goal is to make the data accessible, interoperable and valuable for research, education and innovation. In order to kick-start the development of new services, SWITCH carries out so-called “InnoLab” projects which have an experimental character and are geared towards generating quick learnings. The present InnoLab project brought together researchers and software developers from SWITCH, Wikimedia Sverige, the University of Applied Sciences of the Grisons as well as the Bern University of Applied Sciences. The goal was to develop a microservice that supports the semi-automatic tagging of images in order to interlink them with concepts on Wikidata. It thus facilitates the search and discovery of relevant images by researchers and other interested parties. The microservice builds upon an existing crowdsourcing tool, the ISA Tool, that has been deployed on Wikimedia Commons in 2019 where it is used to apply “depicts” statements describing the content of images stored in the free media repository. The semi-automatic tagging functionality added to the ISA Tool in the course of the present project relies on two distinct algorithms: One of them is used to extract entities from the image itself. For this purpose, the Google Cloud Vision service available on Wikimedia Commons is used. The other one extracts entities from the image metadata, thus leveraging earlier efforts made to describe the content of the images. At the time of writing, the enhanced version of the ISA Tool is available in the test environment and can be used to add “depicts” statements to images on Wikimedia Commons. Plans to deploy it to production have been postponed due to several remaining bugs. The key learnings gained in the course of the project can be summarized as follows: – There are several issues that need to be tackled to allow for wider use and promotion of the ISA Tool: performance issues, reliability issues, improvement of multilingual support. – Once these issues have been resolved, measures should be taken to increase the visibility and take-up of the tool among potential contributors. As an accompanying measure, it would be advisable to assess and monitor the relevance of the ISA Tool in comparison to other tools and methods employed to add Structured Data on Commons. Moreover, activities to further promote the tool among the volunteer community should be accompanied by a dialogue with various stakeholders on what constitutes “good” tagging of images. – The algorithms used for semi-automatic tagging should be further improved and/or complemented; a variety of avenues to be pursued to this effect have been suggested. – Research use cases in the context of the SWITCH Research Data Connectome should be facilitatedby developing alternatives to the current requirement of uploading all media files to Wikimedia Commons. Some initial use cases have been identified in the areas of digital humanities, medicallibraries etc. – Requirements arising from research use cases making use of “depicts” statements beyond theircurrent use for search and discovery should be further investigated. – If the ISA Tool is to be used on a large scale in the context of the SWITCH Research Data Connectome, the conclusion of contractual agreements with service providers may be indicated. Roles and responsibilities with regard to deployment, operations and maintenance need to be clarified.