Reliable Medical LLM and Vision-Language RAG through Multi-Agent Orchestration and Single-Step Preference Alignment

Pahwa, Khushbu

Reliable Medical LLM and Vision-Language RAG through Multi-Agent Orchestration and Single-Step Preference Alignment

dc.contributor.advisor	Hu, Xia (Ben)	en_US
dc.creator	Pahwa, Khushbu	en_US
dc.date.accessioned	2025-01-17T17:41:56Z	en_US
dc.date.created	2024-12	en_US
dc.date.issued	2024-12-06	en_US
dc.date.submitted	December 2024	en_US
dc.date.updated	2025-01-17T17:41:56Z	en_US
dc.description.abstract	Medical RAG systems and long-context models like Med-PaLM face distinct yet interconnected challenges in processing complex medical information. While RAG systems struggle with hallucinations due to noisy retrievals and incomplete fact verification, long-context models, despite their ability to process extended inputs, suffer from attention dilution and context retention issues. Current BioNLP RAG systems have particularly overlooked the critical balance between retrieved context and parametric knowledge, often leading to hallucinations from over-reliance on retrieved information. Our HALO-MMedRAG framework addresses these challenges through a comprehensive multi-agent architecture. The system’s effectiveness stems from four innovative components: query-intent parser agent, multi-query generation agent, coarse retrieval agent, fine-grained hallucination aware retrieval agent with a perplexity and NLL based hybrid scoring for the chunks, generation agent, a light-weight fact-verification agent and an orchestrator agent that manages a CoT reasoning debate among 3 agents to provide the final hallucination free response, grounded in factuality. The notion of Retrieval Augmented Generation in the context of Multimodal Medical LLMs has not been given due consideration from the lens of hallucination mitigation. Further, the existing approaches have been limited in their coverage of the medical domains, often limited to X-Ray. Medical Multimodal LLMs, when utilized for Multi-Modal Retrieval Augmented Generation, face critical challenges in maintaining factual accuracy while integrating complex visual and textual information. Our innovative approach addresses these challenges through a unified Triple Preference Optimization framework with three-stage preference dataset curation, focusing on cross-modal alignment, retrieval balance, and a dual staged visual feedback agent. Unlike existing solutions, our method employs a single-step optimization process that simultaneously handles multiple aspects of alignment while maintaining computational efficiency. Through careful curation of preference datasets that capture different levels of alignment quality, combined with a visual feedback agent for precise visual grounding to provide visual prompting for the Vision Language Model to improve its response, our approach significantly reduces hallucinations and improves medical response accuracy. Extensive evaluation across diverse medical domains, including radiology, ophthalmology, pathology, magnetic resonance imaging and CT scan demonstrates superior performance compared to the existing multimodal medical RAG methods, making our solution titled Align-MedRAG-VL, both practical and reliable for real-world medical applications where hallucination mitigation is paramount.	en_US
dc.embargo.lift	2030-12-01	en_US
dc.embargo.terms	2030-12-01	en_US
dc.format.mimetype	application/pdf	en_US
dc.identifier.uri	https://hdl.handle.net/1911/118242	en_US
dc.language.iso	en	en_US
dc.subject	Medical RAG	en_US
dc.subject	Hallucination Mitigation	en_US
dc.subject	Multi-Agent Orchestration	en_US
dc.subject	Visual Feedback	en_US
dc.subject	LLMs	en_US
dc.subject	VLMs	en_US
dc.subject	Preference Alignment	en_US
dc.title	Reliable Medical LLM and Vision-Language RAG through Multi-Agent Orchestration and Single-Step Preference Alignment	en_US
dc.type	Thesis	en_US
dc.type.material	Text	en_US
thesis.degree.department	Computer Science	en_US
thesis.degree.discipline	Computer Science	en_US
thesis.degree.grantor	Rice University	en_US
thesis.degree.level	Masters	en_US
thesis.degree.name	Master of Science	en_US

Files

License bundle

Now showing 1 - 2 of 2

Name:: PROQUEST_LICENSE.txt
Size:: 5.84 KB
Format:: Plain Text
Description:

Download

Name:: LICENSE.txt
Size:: 2.98 KB
Format:: Plain Text
Description:

Download

Collections

Rice University Theses and Dissertations