Medizinische Universität Graz Austria/Österreich - Forschungsportal - Medical University of Graz

Logo MUG-Forschungsportal

Gewählte Publikation:

SHR Neuro Krebs Kardio Lipid Stoffw Microb

Draschl, A; Hauer, G; Fischerauer, SF; Kogler, A; Leitner, L; Andreou, D; Leithner, A; Sadoghi, P.
Are ChatGPT's Free-Text Responses on Periprosthetic Joint Infections of the Hip and Knee Reliable and Useful?
J Clin Med. 2023; 12(20): Doi: 10.3390/jcm12206655 [OPEN ACCESS]
Web of Science PubMed PUBMED Central FullText FullText_MUG

 

Führende Autor*innen der Med Uni Graz
Sadoghi Patrick
Co-Autor*innen der Med Uni Graz
Andreou Dimosthenis
Fischerauer Stefan Franz
Hauer Georg
Kogler Angelika
Leithner Andreas
Leitner Lukas
Altmetrics:

Dimensions Citations:

Plum Analytics:

Scite (citation analytics):

Abstract:
BACKGROUND: This study aimed to evaluate ChatGPT's performance on questions about periprosthetic joint infections (PJI) of the hip and knee. METHODS: Twenty-seven questions from the 2018 International Consensus Meeting on Musculoskeletal Infection were selected for response generation. The free-text responses were evaluated by three orthopedic surgeons using a five-point Likert scale. Inter-rater reliability (IRR) was assessed via Fleiss' kappa (FK). RESULTS: Overall, near-perfect IRR was found for disagreement on the presence of factual errors (FK: 0.880, 95% CI [0.724, 1.035], p < 0.001) and agreement on information completeness (FK: 0.848, 95% CI [0.699, 0.996], p < 0.001). Substantial IRR was observed for disagreement on misleading information (FK: 0.743, 95% CI [0.601, 0.886], p < 0.001) and agreement on suitability for patients (FK: 0.627, 95% CI [0.478, 0.776], p < 0.001). Moderate IRR was observed for agreement on "up-to-dateness" (FK: 0.584, 95% CI [0.434, 0.734], p < 0.001) and suitability for orthopedic surgeons (FK: 0.505, 95% CI [0.383, 0.628], p < 0.001). Question- and subtopic-specific analysis revealed diverse IRR levels ranging from near-perfect to poor. CONCLUSIONS: ChatGPT's free-text responses to complex orthopedic questions were predominantly reliable and useful for orthopedic surgeons and patients. Given variations in performance by question and subtopic, consulting additional sources and exercising careful interpretation should be emphasized for reliable medical decision-making.

Find related publications in this database (Keywords)
artificial intelligence
large language model
periprosthetic joint infection
hip prosthesis
knee prosthesis
© Med Uni Graz Impressum