Hildebrandt publishes on ‘Ground Truthing in the European Health Data Space’ in Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (Scitepress 2023)

This publication is based on the Keynote Hildebrandt delivered at the BIOSTEC 2023 conference in Lisbon. Though the connection to legal technologies may not be immediately apparent, the issues discussed here present significant parallels and relevance for the legal tech sphere. “Ground truthing” is a key design decision in the process of e.g. prediction of judgments and legal search. It more notably concerns the establishment of a dataset as exemplifying what a machine learning algorithm should learn about the distribution of future data. For an earlier version of the publication, click here.


In this position paper I discuss the use of health-related training data for medical research, in light of the European Health Data Space. If such data is deployed as a proxy for ‘the truth on the ground’, we need to address the issue of proxies. Ground truth in machine learning is the pragmatic stand-in or proxy for whatever is considered to be the case or should be the case. Developing a ground truth dataset requires curation, i.e. a number of translations, constructions and cleansing. What if the resulting proxies misrepresent what they stand for and what if the imposed interoperability of health data across the EU affects the quality of the data and/or their relationship to what they stand for? I argue that ground-truthing is an act rather than a given, that this act is key to machine learning and assert that this act can have potentially fatal implications for the reliability of the output. Deciding on the ground truth is what philosophers may call a speech act with performative effects. Emphasising these effects will allow us to better address the constructive nature of the datasets used in medical informatics and should help the EU legislature to take a precautionary approach to medical informatics.