04/08/2025
                                            AI now diagnoses at 92% accuracy—outperforming physicians at 74% in randomized trials. Yet adding GPT-4 to clinicians doesn’t yield consistent improvement in diagnosis or treatment. Why?
New Stanford AI Index 2025 data reveals a paradox: LLMs are impressive at clinical inference, but real-world workflows haven’t caught up. 
Some highlights:
🩺 Diagnosis & Treatment
 – GPT-4: 92% diagnostic accuracy vs. physicians at 74%
 – Treatment decisions: 84.7% vs. 78.2% (GPT-4 vs. MDs)
 – No clear synergy when humans + GPT-4 combine → signals deeper issues than “just integrate it”
📈 Benchmarks Hitting a Ceiling
 – MedQA at 96% accuracy (zero fine-tuning)
 – Static question-answer sets do not reflect the messiness of real clinical practice
 – Next frontier: benchmarks that reflect real-world complexity and nuance 
💰 Cost vs. Outcomes
 – OpenAI's “o1” sets new records on 19 clinical tasks but is ~1.5× more expensive than GPT-4
 – Raises questions on ROI for large-scale hospital deployment
🤖 Ambient AI Scribes
 – ~20 minutes less EHR work/day
 – Burnout down 26%, admin burden down 35%
 – Next step: from “listen + transcribe” to “listen + act”
🔬 Research & Trials
 – 1,200+ healthcare LLM papers in 2024, doubling yearly
 – 537 AI-specific clinical trials; U.S. & China lead globally
 – FDA: 223 AI med devices approved in 2023 vs. single digits a decade ago
Full 18-page chapter for deeper data: team spent months on this. 
Great working on this one with co authors Stephen Ma, Jonathan H. Chen and other researchers + faculty. 
AI is surpassing physicians on standardized tasks—but how might that reshape the physician’s role? Are we ready—legally, ethically, and operationally—for a future where AI drives diagnosis and treatment plans, with physicians offering higher-level oversight and patient advocacy? 
If data-driven tasks become largely automated, which human skills gain new importance, and how should medical education adapt?
                                                     
 
                                                                                                     
                                                                                                     
                                                                                                     
                                                                                                     
                                                                                                     
                                                                                                     
                                                                                                     
                                                                                                     
                                                                                                     
                                                                                                     
                                         
   
   
   
   
     
   
   
  