Studies Find Bias in AI Medical Tools Underserve Women, Minorities and Non-Native Speakers

Ars Technica •

Studies from MIT, LSE and others show popular LLM-based medical tools (GPT-4, Llama 3, Palmyra-Med, Google’s Gemma) often understate symptoms for women and give less empathetic or lower-care recommendations for Black and Asian patients. Models also penalize messy or nonstandard language, making non-native speakers more likely to be advised against care. Researchers blame biased training data and call for representative medical datasets, transparency, and clinical benchmarking to reduce harms.

Read original ↗