Radiologists using AI-based clinical decision support may develop automation bias— the tendency of humans to favor suggestions from automated decision-making systems—and impair the accuracy of their mammography readings. These findings from researchers in Germany and the Netherlands were published in the journal Radiology.
While a number of previous studies have shown that the introduction of computer-aided detection within mammography workflow might affect performance of radiologists, the new study is the first to directly examine the influence of AI systems on the performance of radiologist—at every level—of performing accurate mammogram readings.
“We anticipated that inaccurate AI predictions would influence the decisions made by radiologists in our study, particularly those with less experience,” said Thomas Dratsch, M.D., Ph.D., from the Institute of Diagnostic and Interventional Radiology, at University Hospital Cologne in Cologne, Germany and the study’s lead author. “Nonetheless, it was surprising to find that even highly experienced radiologists were adversely impacted by the AI system’s judgments, albeit to a lesser extent than their less seasoned counterparts.”
In the experiment, the researchers had 27 radiologists read 50 mammograms. They then provided their Breast Imaging Reporting and Data System (BI-RADS) assessment assisted by an AI system. BI-RADS is a standard system used by radiologists to describe and categorize breast imaging findings. While BI-RADS categorization is not a diagnosis, it helps doctors determine the next steps in care.
They then presented the mammograms in two randomized sets. The first was a training set of 10 in which the AI suggested the correct BI-RADS category. The second set contained incorrect BI-RADS categories, purportedly suggested by AI, in 12 of the 40 mammograms.
The findings showed that the radiologists were significantly worse at assigning the correct BI-RADS scores for the cases in which the purported AI suggested an incorrect BI-RADS category. For example, inexperienced radiologists assigned the correct BI-RADS score in almost 80% of cases in which the AI suggested the correct BI-RADS category. When the purported AI suggested the wrong category, their accuracy fell to less than 20%. Experienced radiologists—those with more than 15 years of experience on average—saw their accuracy fall from 82% to 45.5% when the purported AI suggested the incorrect category.
These results, the investigators noted, are a cautionary finding of why human-machine interactions much be carefully monitored to ensure accurate performance of human readers who incorporate AI-based advice.
“Given the repetitive and highly standardized nature of mammography screening, automation bias may become a concern when an AI system is integrated into the workflow,” Dratsch noted. “Our findings emphasize the need for implementing appropriate safeguards when incorporating AI into the radiological process to mitigate the negative consequences of automation bias.”
Suggested steps for avoiding automation bias include giving users with the confidence levels of these AI-based decision systems. One method to do that is to also show the probability of each output. Another safeguard could teaching users about how the systems themselves derive their answers. Both could help users of these systems feel accountable for their own decisions and decrease this bias.
Next steps for the researchers is a study that will use tools such as eye-tracking technology to better understand how radiologists using AI systems make their decisions.
“Moreover, we would like to explore the most effective methods of presenting AI output to radiologists in a way that encourages critical engagement while avoiding the pitfalls of automation bias,” Dratsch said.