Healthcare researchers must be wary of misusing machine learning

An international team of researchers advises that strong care should be taken not to misuse or overuse machine learning (ML) in healthcare research. They published their recommendations in Nature Medicine.

‘I absolutely believe in the power of ML, but it has to be a relevant addition,’ said Erasmus MC neurosurgeon-in-training and statistics editor Victor Volovici, first author of the comment. ‘Sometimes ML algorithms do not perform better than traditional statistical methods, leading to the publication of papers that lack clinical or scientific value.’

Revolution

Real-world examples have shown that the misuse of algorithms in healthcare could perpetuate human prejudices or inadvertently cause harm when the machines are trained on biased datasets.

‘Many believe ML will revolutionise healthcare because machines make choices more objectively than humans. But without proper oversight, ML models may do more harm than good’, said Associate Professor Nan Liu, senior author of the comment, from the Centre for Quantitative Medicine and Health Services & Systems Research Programme at Duke-NUS Medical School, Singapore.

‘If, through ML, we uncover patterns that we otherwise would not see—like in radiology and pathology images—we should be able to explain how the algorithms got there, to allow for checks and balances.’

Together with a group of scientists from the UK and Singapore, the researchers highlight that although guidelines have been formulated to regulate the use of ML in clinical research, these guidelines are only applicable once a decision to use ML has been made and do not ask whether or when its use is appropriate in the first place.

Recommendations

For scientists who want to get started with Machine Learning, Volovici and his colleagues have the following recommendations:

ML should be used for what it is good at. One must demonstrate that it works better than traditional statistical models. Explain the choice well. Do not use ML for data sets that are too small.
In particular, try to make deep learning (self-directed ML algorithms) methods as transparent and understandable as possible. Publish the parameters and if possible also the analysis and the dataset.
Name the limitations and be honest about them. Explain on what basis the algorithm draws conclusions.

Facial recognition

For example, companies have successfully trained ML algorithms to recognise faces and road objects using billions of images and videos. But when it comes to their use in healthcare settings, they are often trained on data in the tens, hundreds or thousands. ‘This underscores the relative poverty of big data in healthcare and the importance of working towards achieving sample sizes that have been attained in other industries, as well as the importance of a concerted, international big data sharing effort for health data,’ the researchers write.

Black box

Another issue is that most ML and deep learning algorithms (that do not receive explicit instructions regarding the outcome) are often still regarded as a ‘black box’. For example, at the start of the COVID-19 pandemic, scientists published an algorithm that could predict coronavirus infections from lung photos. Afterwards, it turned out that the algorithm had drawn conclusions based on the imprint of the letter ‘R’ (for ‘Right Lung’) in the photos, which was always found in a slightly different spot on the scans.

‘We have to get rid of the idea that ML can discover patterns in data that we cannot understand,’ said Volovici about the incident. ‘ML can very well discover patterns that we cannot see directly, but then you have to be able to explain how you came to that conclusion. To do that, the algorithm has to show what steps it took, which requires innovation.’

Limits

The researchers advise that ML algorithms should be evaluated against traditional statistical approaches (when applicable) before they are used in clinical research. And when deemed appropriate, they should complement clinician decision-making rather than replace it.’ ML researchers should recognise the limits of their algorithms and models to prevent their overuse and misuse, which could otherwise sow distrust and cause patient harm,’ the researchers write.

The team is working on organising an international effort to provide guidance on the use of ML and traditional statistics and also to set up a large database of anonymised clinical data that can harness the power of ML algorithms.