Hildebrandt publishes on Ground-Truthing in the European Health Data Space (pre-print publication)

In this position paper, Mireille Hildebrandt discusses the use of health-related training data for medical research, in light of the European Health Data Space. If such data is deployed as a proxy for ‘the truth on the ground’, we need to address the issue of proxies. Ground truth in machine learning is the pragmatic stand-in or proxy for whatever is considered to be the case or should be the case. Developing a ground truth dataset requires curation, i.e. a number of translations, constructions and cleansing. What if the resulting proxies misrepresent what they stand for and what if the imposed interoperability of health data across the EU affects the quality of the data and/or their relationship to what they stand for? She argues that ground-truthing is an act rather than a given, that this act is key to machine learning and assert that this act can have potentially fatal implications for the reliability of the output. Deciding on the ground truth is what philosophers may call a speech act with performative effects. Emphasising these effects will allow us to better address the constructive nature of the datasets used in medical informatics and should help the EU legislature to take a precautionary approach to medical informatics.

The paper has been submtted to Biomedical Engineering Systems, Springer (forthcoming). The pre-print publication is available here.