The New Era of AI "De-Skilling" is Here
The genie is out of the bottle, and also there's no bottle
ChatGPT can outreason academic internal medicine attendings on complex medical problems, and can solve a majority of the notoriously challenging NEJM clinical cases on its own. For more common clinical cases, it suggested the correct diagnosis 100% of the time, compared to 84% for physicians.
And that was the older versions of ChatGPT. The newer versions are much smarter, although improvements have slowed down.
While integrating AI into clinical practice still faces significant obstacles, its integration into one’s own medical education is easy—just ask any resident or fellow with a smartphone.
This society-wide experiment is only a couple of years old, but AI tools already permeate medical education. Clinical educators in the NEJM worry that
the use of AI in medical training could result in professionals who are highly efficient yet less capable of independent problem solving and critical evaluation than their pre-AI counterparts.
There are hints that this is already happening. In the few studies that have tested AI’s influence on medical learning,
Bad AI-generated information tends to bias clinicians away from their initial (better) plan.
Lower-performing radiologists may perform even more poorly as a result of using AI. (For stronger physicians, using AI further increased their performance.)
In education and business,
AI math tutoring improved performance during practice, but worsened it during tests, among 1,000 high school students.
Greater use of AI was correlated with reduced critical thinking, according to a Swiss business school study and Microsoft’s internal polling.
How are medical educators responding?
The American Association of Medical Colleges put forth a position statement full of predictable pabulum (“AI should be used in a way that promotes equity and inclusivity in medical education … Creating a safe AI environment in which educators can explore its use is critical.”)
Equity and inclusivity, sure … but how do we ensure future generations of doctors trained under AI are competent?
The NEJM authors propose a practical framework that acknowledges the new centrality of AI in medical education and seeks to harness it to enhance rather than replace learning.
In the article’s narrative, an attending catches a sneaky resident using AI across the room. It’s the new teachable moment: she asks the resident to walk through how she used the AI tool; together, they methodically explore how to do so more critically, using a structured framework that they helpfully share with us. The resident becomes enlightened, saying things like, “I should verify the AI outputs next time.” The attending then decides if the resident will henceforth be allowed to use AI without supervision.
Yeah, maybe.
The NEJM authors also acknowledge that most educators are already out of their depth on AI, as they “find themselves supervising the use of a technology that learners may be more adept at using than the educators themselves are.”
This is the most salient point of the piece, which misapprehends the scale and totality of the transformation that has already taken place.
Medical educators need to formulate some response to this new game-changing tech. But the proposed approach sounds analogous to asking teens to be less online by telling them how much they’re missing out on real life. To them, online life is life.
And today, AI-augmented education is education.
Unless a major disruption occurs, AI soon will be baked into virtually every informational tool used in education or clinical decision-making. It will power every search engine it does not replace directly. It will quickly provide an almost-always-accurate answer to almost every training-level clinical question. When properly prompted, it already does this.
Achieving a surface-level understanding of medicine—or feeling like one does—will seem almost effortless. Finding and reading primary sources, thinking critically about them, and relating them to clinical cases—that is, learning—will require what will seem to some like an exceptional effort.
What I’m saying is,
The Counterfactual
A certain percentage of medical trainees will offload their work and their thinking to AI. And they will become less-skilled doctors. There is likely no way to stop this from happening.
But a large majority actually want to learn, grow, and excel in their careers, and use their hard-earned and hard-learned competence and skills to help improve the health of their fellow human beings. The hyperconscientious overachievers who apply to medical schools year after year seem to perceive the risks to their development, their professional reputation, and patient safety that overreliance on ChatGPT can bring.
Here are some sample resident comments from Reddit in response to the question How do you use AI/chatGPT in residency?
Never ChatGPT. Would never use it or similar AI to help me direct actual patient care.
If I want to brush up on something it can fill the exact gap in my knowledge. But for real medicine stuff I never actually used it.
I probably send ~20+ in basket messages a day and I have not once used the suggested message pre-written by AI. It also makes stuff up, and it's so confident! AI is gaslighting us. Lol
Outsourcing these cognitive tasks to AI makes us worse at them, simple as that. My opinion is that during training we should be honing these skills as much as possible rather than becoming overly reliant on these external tools.
This is encouraging to read.
The Other De-Skilling?
As they learn, physicians-in-training are at risk from overreliance on AI. Soon, though, practicing physicians who do not use AI will be at a disadvantage to those who do.
Or at least I know I would be. As one of dozens of examples I could share, I knew that cocaine could cause diffuse lung injury and ARDS. I was pretty sure that intranasal cocaine could cause it, not just smoked cocaine. ChatGPT confirmed this for me. But did I know that street cocaine is often cut with levamisole, which can cause ANCA vasculitis? I did not. (And yes, I fact-checked it.)
I later recalled half-confidently that pleural fluid AFB culture is insensitive for tuberculosis. But did I know that ADA and IGRA are each highly sensitive? No. The LLM suggested it. I added the tests; the care improved.
Etc., etc., almost every time I use the tool.
The NEJM authors argue we should be careful to be centaurs (assigning tasks to AI, which I guess here is our horse part, while we stay in charge up front) and not cyborgs (merging with AI, letting it do our thinking for us).
What goes unsaid: both centaurs and cyborgs can beat humans’ asses.
At least when it comes to forging differential diagnoses, ChatGPT performs better than almost all individual physicians already, and will most likely continue to widen that gap.
I’m not happy about that, but I’ve tested it repeatedly (and I always lose), and it’s also been shown in more structured testing.
That means its educational potential is enormous. So why are internal medicine academics getting all judgey, instead of scrambling to formally integrate AI throughout medical education?
Because LLMs would outperform attendings so consistently that it would be devastating to the hierarchical system of traditional GME. Attendings’ intellectual authority would evaporate. It would be embarrassing.
I’m recalling the eminent professor from my residency who, it later came to light, always asked the chief residents for the answer to the cases before he embarked on his seemingly extemporaneous (and always triumphant) demonstration of clinical problem-solving. ChatGPT would have wiped the floor with him—or virtually anyone else.
How We Win
In case I sound like an LLM fan, let me re-emphasize: I am not happy about this. (And I never, ever let LLMs write anything for me, and never will.) Amidst all the hype, a fraction of which is justified, I try to keep something in mind.
The most important and meaningful clinical learning comes from integrating book knowledge into the fascinating and infinitely complex universe of real patients with real problems and real emotional lives to connect to. ChatGPT can’t climb out of its sterile, disembodied, statistically modeled existence to participate in that sacred art.
At its heart, medicine (both learning and practicing it) is a social activity. No matter how many board questions it gets right, no one wants to entrust all their learning, or their care, to a know-it-all chatbot.
That’s what I’m counting on, anyway.






Your observation about pleural fluid analysis for TB is interesting, because I didn’t read it thinking that the AI did your thinking for you. I saw that you looked something up and learned new information that you will use in the future, because we are also systems that can learn and adapt. This is no different than looking something up online or in a book - more efficient, sure, which is one of its strengths, but still a human act that expanded a human’s knowledge. In the future, you will know to order an ADA on that pleural fluid (and maybe get a pleural biopsy).
I am 51 years old - old enough to remember paper charts, but young enough to have come up with a variety of evolving digital tools. I am a bit of a Luddite when it comes to some tools in the ICU. (Arterial lines are not as useful as some people think; before you do your cleverly-acronymed POCUS study, have you considered taking a history and performing a physical exam?) But as I’ve used these systems a bit more, I see their utility and will work to find ways to improve care by making me better, not by replacing me.
The AI clinical notes are also interesting. I have found that the outpatient notes they generate are poorly-written, do not explain clinical reasoning well, and have a prose style not to my liking. However, they do take notes very well. I have taken to using DAX as my note taker in clinic, rather than my note writer. It doesn’t speed me up dramatically, but it does mean that I can focus my attention entirely on the patient in the office and not on my notepad or a computer screen. After clinic, I can use the notes it took to write a proper note.
Decreasing the time it takes me to find needed data serves my patients and serves my competence. Taking notes for me in the office means that I can focus my attention more directly on the patient. I will not delegate the core clinical decision-making and personal relationships that make critical care what it is. But if the machine can speed up my lit search, I can live with that.
I finished med school in 2010...my first 2 years of residency were still on paper...oh how I miss paper charting; but even then, I knew a good deal of the technology available and how to use it. I even remember the specific moment when my MA show me a function I overlooked on the computer...something I did with my attendings and others still to this day, so it hit me like a ton of bricks when that happened.
With the LLM's, I'm absolutely lost. I downloaded to my phone the ChatGPT app, but really didn't find a use for it enough to replace just using google search (though I admit, I do like - most of the time - the google's AI summary). After a month of not using it more than 2 or 3 times, I deleted the app.
For the last several years, I've been full time locums, and use multiple different EMRs. I've seen some with the disclaimer that AI was utilized in the development of the progress note (specifically in an office), but have yet to really see AI, knowingly, in any of the systems I've used that I can easily access within the systems (much less to be able to use and make things run smoother).
What I am curious about, what apps/websites/etc, are others using for AI or that integrate AI and how are they using it - like I do Google Search - or ?.
Thanks!