Collaborative research in Asia has called for prudence in using popular large language model-based (LLM) chatbots as part of public health research and response.
In a study, which findings were published in BMJ, researchers from the Chinese University of Hong Kong, RMIT University in Vietnam, and the National University of Singapore explored whether or not LLMs bridge or exacerbate the digital divide in accessing accurate health information.
WHY IT MATTERS
Using the widely available OpenAI’s GPT-3.5 chatbot, the research team asked about symptoms of atrial fibrillation in Vietnamese. The chatbot then responded with answers related to Parkinson’s disease.
“Misinterpretations in symptom detection or disease guidance could have severe repercussions in managing outbreaks,” Kwok Kin-on, one of the researchers and an associate professor of public and primary health at CUHK Faculty of Medicine, warned.
Researchers identified the issue stemming from the LLM tool’s language bias, having been trained less on such low-resource languages (or those languages with little available digital resources) as Vietnamese. This, according to Dr Arthur Tang, a senior lecturer at RMIT University Vietnam, results in low quality responses in languages it is less exposed to.
“This disparity in LLM accuracy can exacerbate the digital divide, particularly since low-resource languages are predominantly spoken in lower- to middle-income countries,” he added.
NUS associate professor Wilson Tam advised careful monitoring of AI chatbot’s accuracy and reliability, “especially when prompts are entered and responses generated in low-resource languages.” “While providing an equitable platform to access health information is beneficial, ensuring the accuracy of this information is essential to prevent the spread of misinformation.”
The researchers proposed improving LLM’s translation capabilities for diverse languages and creating and sharing open-source linguistic data and tools to promote AI language inclusivity, hence addressing the current limitations of LLM-driven healthcare communication.
“It is vital to enhance LLMs to ensure they deliver accurate, culturally and linguistically relevant health information, especially in regions vulnerable to infectious disease outbreaks,” CU Medicine A/Prof Kwok stressed.
THE LARGER TREND
In another study, the same research team demonstrated the use of ChatGPT in developing a disease transmission model to inform infection control strategies. The LLM tool, they said, served as a co-pilot in quickly constructing initial transmission models and various model variants, “drastically” reducing the time needed to develop such complex models.
“Rapid response is crucial when facing potential outbreaks, and LLMs can significantly expedite the preliminary analysis and understanding of a novel pathogen’s transmission dynamics. By providing instantaneous modeling support, these systems allow for real-time scenario analysis, facilitating faster and more informed decision-making,” they noted.
Since healthcare’s widespread uptake of LLMs last year, a growing body of research is emerging to test their accuracy and effectiveness across various applications.
Studies last year, for example, affirmed ChatGPT’s accuracy in making clinical decisions, as well as providing appropriate answers to preventing cardiovascular diseases.
While the tool was reported to pass the competitive United States Medical Licensing Exam last year, ChatGPT was also found to fail tests by the American College of Gastroenterology.
Besides language bias, a Yale study published this year uncovered an “alarming” finding proving ChatGPT’s racial bias. Researchers said two versions of the chatbot rendered statistically significant differences in simplifying radiology reports when provided with the inquirer’s race.