A study by researchers at the University of Maryland School of Medicine (UMSOM) found that answers generated by ChatGPT for 25 questions related to breast cancer screening were correct 88% of the time. However, some of the answers were either inaccurate or fictitious, according to the study published this week in the journal Radiology.
To test ChatGPT, the UMSOM researchers developed a set of 25 questions seeking advice on getting breast cancer screening, the submitted the questions set three times to see what responses were generated. The team submitted multiple times since the chatbot often varies answers to the same or similar questions. The responses were then provided to three radiologists to judge the quality of the answers. The fellowship-trained radiologists deemed answers to 22 of the 25 questions as appropriate. ChatGPT correctly answered questions about the symptoms of breast cancer, who is at risk, and questions on the cost, age, and frequency recommendations concerning mammograms.
However, one answer was based on outdated information and the two others had responses that varied significantly each time the same questions was asked, making the answers unreliable.
“We found ChatGPT answered questions correctly about 88% of the time, which is pretty amazing,” said study corresponding author Paul Yi , MD, Assistant Professor of Diagnostic Radiology and Nuclear Medicine at UMSOM and director of the UM Medical Intelligent Imaging Center (UM2ii). “It also has the added benefit of summarizing information into an easily digestible form for consumers to easily understand.”
While the first run provided good results, the research team indicated that the chatbot may not be ready to take on the king of online information—a simple Google search—noting the responses were not as comprehensive as one would get using the ubiquitous search engine.
“ChatGPT provided only one set of recommendations on breast cancer screening, issued from the American Cancer Society, but did not mention differing recommendations put out by the Centers for Disease Control and Prevention (CDC) or the US Preventative Services Task Force (USPSTF),” said study lead author Hana Haver, MD, a radiology resident at University of Maryland Medical Center.
In one of the inappropriate responses, ChatGPT provided outdate information concerning planning a mammogram around a COVID-19 vaccination, noting people should delay the mammogram for four to six weeks after receiving a vaccination. These guidelines were changed in February 2022 to indicate that no wait time was needed. Other inconsistencies were reported concerning the risk of developing breast cancer and where a person could get a mammogram.
Even more troubling is the chatbot’s tendency to create information out of thing air.
“We’ve seen in our experience that ChatGPT sometimes makes up fake journal articles or health consortiums to support its claims,” said Dr. Yi. “Consumers should be aware that these are new, unproven technologies, and should still rely on their doctor, rather than ChatGPT, for advice.”
He and his colleagues are now analyzing how ChatGPT fares for lung cancer screening recommendations and identifying ways to improve the recommendations made by ChatGPT to be more accurate and complete. The team will also address how to make the responses more accessible and understandable to a lay audience.
“With the rapid evolution of ChatGPT and other large language models, we have a responsibility as a medical community to evaluate these technologies and protect our patients from potential harm that may come from incorrect screening recommendations or outdated preventive health strategies,” said Mark T. Gladwin, MD, Dean, University of Maryland School of Medicine, Vice President for Medical Affairs, University of Maryland, Baltimore, and the John Z. and Akiko K. Bowers Distinguished Professor.