Data quality as a barrier to validating AI algorithms, and how this will be improved with the EHDS
07 June 2024
As part of the CAIDX project, roundtable discussions have been organised bringing together legal officers, innovation officers, IT specialists and researchers/users of AI from all six CAIDX partner countries, i.e. Sweden, Denmark, Finland, Germany, Poland and Estonia. During the discussions in May, several partners expressed concerns about the difficulties in validating AI algorithms.
Developing and validating AI algorithms requires large amounts of good quality data. Such data is not readily available. As AI algorithms become more advanced, they will become better at identifying more complex patterns and correlations.
However, this will also make the algorithms more susceptible to being misled. The quality of the data that goes into the models and is used to train the algorithms will therefore be much higher than for traditional statistical analysis.
At present, most partner countries require patient consent for health data to be used in research. However, using data only from people who have consented to participate in research will lead to biased data, as the population that agrees to participate in research may not be representative of the whole population.
The key to this process is to accept the right of patients to opt out for data that are identified as their data, while pseudonymised data, which rely on a key that should be kept separate for identification, should be more accessible.
The forthcoming European Health Data Space (EHDS) has taken this into account by introducing the possibility for each nation to remove the right to opt-out when it comes to secondary use of pseudonymised data for research.
In addition, the EHDS will make it much easier to share data between different EU countries. This means that AI algorithms can be validated on data from different countries, which will be a huge benefit. We are therefore looking forward to the upcoming EU regulation, which will facilitate access to high-quality health data within Europe.
Finally, the AI law, which has been passed by both the EU Parliament and the Council, recognises that there is always a risk, and that the rules implied are related to which risk category the project falls under. So the risk of data leakage or data breach is not an argument for not allowing a project. Rather, it is an incentive to get the right level of regulatory restrictions.