[Baidu 06] Can generative artificial intelligence pass the orthopedic specialty exam?

Key Points

  • Performance of Artificial Intelligence in Orthopedic Examinations: This article evaluates the performance of a generative artificial intelligence model, ChatGPT, in the Orthopaedic In-Training Examination (OITE) and Self-Assessment Examination (SAE) developed by the American Academy of Orthopaedic Surgeons (AAOS), as a proxy for the American Board of Orthopaedic Surgery (ABOS) examinations.
  • Methods: The study selected 301 SAE questions and related literature from the AAOS database, inputting them into ChatGPT’s interface as question-and-answer and multiple-choice formats. The model’s selected answers were then analyzed and compared with the answers from the OITE and SAE exams.
  • Results: Out of 301 questions, ChatGPT correctly answered 183 (60.8%). Among different specialty areas, the model performed best in shoulder/elbow, basic science, sports, and oncology, while performing worst in pediatrics and hand surgery. In different types of questions, the model performed best in diagnosis and worst in management.
  • Conclusion: ChatGPT has the potential to provide accurate clinical conclusions for orthopedic educators and learners, but its reasoning process should be carefully analyzed for accuracy and clinical validity. Therefore, its role in clinical educational settings is currently limited but rapidly evolving.

Original Article Link

https://www.sciencedirect.com/science/article/pii/S0972978X23002593