Does AI change the way doctors think?
A clever research study sheds some light on how increasing use of AI in medicine can influence doctors' decisions, maybe for the worse.
Our book, “Random Acts of Medicine,” is out now! We think you’ll like it, but you don’t have to take our word for it—check out this review in the Wall Street Journal. You can get your copy here or wherever you buy your books!
While the ways in which artificial intelligence (AI) may impact every facet of our lives is a topic of abundant—and important—debate, for obvious reasons, considering the role of AI in medicine feels particularly prescient.
Medicine has been taking advantage of AI technology in various forms for a long time. For example, if you’ve had an electrocardiogram (ECG or EKG, for short) in the past few decades, the tracings collected from the electrodes across your chest were likely analyzed by an AI algorithm to produce a preliminary diagnosis, which was printed, instantaneously, at the top of the ECG printout. Then your doctor (or a cardiologist) took a look at the ECG themself to make their own diagnosis. From there, the ECG becomes a data point in the doctor’s diagnostic and treatment process in determining what should happen next to you, the patient.
When we were learning to read ECGs in residency, we were taught to first cover up the computerized interpretation by folding back the top of the ECG printout. That way, we wouldn’t be unduly influenced by the computer’s diagnosis while we were interpreting the tracings ourselves. Indeed, simply seeing the computerized interpretation can influence resident doctors’ interpretation of ECGs—helping them get the right answer when the AI is correct, and leading them astray when it’s incorrect.
Anchoring Bias Barbie™
Let’s say you were interested in seeing the new Barbie movie. Perhaps some friends had seen it and told you they loved it, or maybe you saw some fans praising the movie on social media, or you read a review that gave it high marks. So you go and see it, and you agreed—the movie was great! (NB: neither of us has seen Barbie…yet).
So then we ask: did the positive opinions that you were exposed to before seeing the movie influence the way you interpreted the movie? “Anchoring bias” says they might.
Anchoring bias is the name for the human tendency to place additional weight on the first piece of information we come across about something compared to information we collect later. Your mind will be anchored to that initial opinion like a ship to a point on the ocean floor. It won’t necessarily keep that opinion, but it uses that opinion as a starting point, and new information must be sufficient enough to move you from that starting point—just like rough enough seas can drag an anchored ship to a new position. Even if our opinion gets pulled away from that initial anchoring point, we still end up closer to that initial point than we would have had we first been exposed to different information.
For Barbie, anchoring bias says that if we are first exposed to positive opinions of the film, we will rate the film more favorably after we see it than we would have had we not been previously exposed to positive opinions, because our minds have been anchored by that initial positive view. Similarly, we would tend to rate the movie more negatively than we otherwise would if we were initially exposed to negative opinions.
Applying anchoring bias to ECGs, we see why reading that AI diagnosis first can sway the resident doctors. They’re anchored to that computer-generated diagnosis; if that diagnosis is wrong, it becomes an uphill battle for the new information to pull them away from it.
We’d be foolish to think that over the course of med school and residency, doctors can learn to fully suppress anchoring bias. We can’t, because like other mental shortcuts, it’s hard wired into us. We can develop strategies to mitigate it, like hiding that initial AI diagnosis before reading the ECG ourselves, but even the most experienced doctors are still susceptible to it. (We explore other examples of anchoring bias in medical care in the chapter of our book titled “What do cardiac surgeons and used car salesmen have in common?”)
Physician anchoring bias—with a twist
Michael Bernstein, an assistant professor at Brown University, sent us a small but insightful study by him and his colleagues of how AI might influence doctors’ thinking. We thought it was interesting and hope you’ll feel the same.
The researchers started with x-rays of the chest, which are one of the first diagnostic images for which doctors send patients who have breathing problems, since they are a great way to get a sense of major problems with the lungs. If they are abnormal, like the one below (see the white splotches in both lungs, which normally appear black), a doctor might then send their patient for a CT scan, which provides a much more detailed view of the lungs than a simple chest x-ray and allows much higher diagnostic accuracy for things like cancer or infection.
Modern chest x-rays are two dimensional digital images, just like any photo you might take with your phone. And just as AI tools can look at pictures on your phone to determine who is in them or help you edit them, AI can be applied to chest x-rays to assist in interpretation—just like they do in ECGs (though the use of AI in chest x-rays is in its infancy compared to ECGs, where it has been routinely used for decades).
In the study, the researchers gave a group of 6 radiologists (doctors who interpret medical images) a set of 90 chest x-rays, some of which were abnormal with lung cancers that would require a follow up CT scan. For each of the 90 x-rays, they asked the radiologists whether they would recommend the patient come in for a CT scan of their chest to provide more detail. The radiologists’ recommendations here would serve as a control moving forward.
After waiting at least a month (though an average of 4 months) between reviews to allow the radiologists to forget the images they had seen, they gave them the same set of 90 x-rays again, but this time, each x-ray was accompanied by an AI interpretation based on the image, “normal” or “abnormal.” The radiologists were told that the AI interpretation would be recorded in the patient’s record in addition to their own interpretation. However, researchers purposefully made 12 out of the 90 AI interpretations incorrect—for reasons we’ll come back to—and asked them about their recommendations for CT scan follow up.
After waiting again for the radiologists to forget the images, they gave them the set of 90 x-rays once again. This third time, the radiologists were told that the AI interpretation would not be saved in the patient’s record—it was only available to them. This meant that if they disagreed with the AI interpretation, nobody else would know.
After waiting for the radiologists to forget the images again, for a final fourth round, researchers gave them the 90 x-rays once more, but this time the AI interpretation was not deleted, and the AI interpretation included a box surrounding the areas of the image the computer had identified as abnormal, if any.
Keeping in mind that the researchers purposefully introduced 12 incorrect AI recommendations, here’s what the researchers found:
Radiologists performed worse when given the AI recommendations, being less likely to recommend a scan when it was actually needed (a “false negative” interpretation of the x-ray) and being more likely to recommend a scan when it was unnecessary (a “false positive”).
This makes sense knowing what we know about anchoring bias. When the researchers introduced 12 purposefully incorrect AI interpretations, doctors were anchored to them and didn’t always move away. Just like the residents reading ECGs, a computerized interpretation seemed to lead the radiologists astray when the AI was wrong.
When the AI interpretation was deleted, however, radiologists generated fewer false positives (i.e. they were less likely to recommend an unnecessary CT scan) than when it was saved in the medical record.
This finding suggested that it made a difference whether or not the AI interpretation would be saved in the record—and thus seen by others. When the AI “saw an abnormality” but the radiologist didn’t, the radiologists appeared more comfortable with their interpretation when nobody would be able to ask them why their opinion differed from the AI. In an age where doctors—radiologists in particular—can be held legally liable for failure to identify abnormalities, it seemed that not having to justify their disagreement (and perhaps rely on their gut) could improve their performance, making them less susceptible to anchoring bias.
When an abnormal AI interpretation was further accompanied by a box surrounding what the computer perceived as an “abnormality,” radiologists made fewer overall mistakes compared to when the AI interpretation did not outline the abnormal areas of the x-ray.
This reveals another way the radiologists might be able to mitigate the AI-induced anchoring bias. When the AI placed a box around the areas of the x-ray it thought was abnormal, radiologists were less likely to be led astray when the AI did so incorrectly. They might also have been more confident in interpreting an image as normal when the AI didn’t draw any box. This suggests that when they were able to see what the AI was theoretically “thinking” or basing its decision on, they were more confident in leaving the anchored diagnosis when it was wrong.
This study is not without its limitations, namely that it was small and performed in an artificial environment unlike the real world settings where AI tools are deployed. Those limitations aside, the study’s findings are intuitive and make sense within the context of existing research on doctors’ decision-making processes that show susceptibility to many cognitive biases—anchoring bias only one among them.
We expect to see more studies like this, as well as more studies of real-world applications of AI, to help guide our use of this powerful technology to help people liver healthier, longer lives. We also expect to have ongoing exploration of AI in medicine on this Substack—leave us a comment about any areas you think we should check out!
I really appreciated this article and your thoughtful perspective!
This is a necessary pushback on the "let's rush in and replace everything with AI" narrative. I'd say that AI in medicine is likely to get much better very quickly, but in the interim there are plenty of spots where it can make things much worse. Even after a few years, there will be a need for medical wisdom.
Thank you for the thought provoking perspective!