Reliable Medical LLM and Vision-Language RAG through Multi-Agent Orchestration and Single-Step Preference Alignment

dc.contributor.advisorHu, Xia (Ben)en_US
dc.creatorPahwa, Khushbuen_US
dc.date.accessioned2025-01-17T17:41:56Zen_US
dc.date.created2024-12en_US
dc.date.issued2024-12-06en_US
dc.date.submittedDecember 2024en_US
dc.date.updated2025-01-17T17:41:56Zen_US
dc.description.abstractMedical RAG systems and long-context models like Med-PaLM face distinct yet interconnected challenges in processing complex medical information. While RAG systems struggle with hallucinations due to noisy retrievals and incomplete fact verification, long-context models, despite their ability to process extended inputs, suffer from attention dilution and context retention issues. Current BioNLP RAG systems have particularly overlooked the critical balance between retrieved context and parametric knowledge, often leading to hallucinations from over-reliance on retrieved information. Our HALO-MMedRAG framework addresses these challenges through a comprehensive multi-agent architecture. The system’s effectiveness stems from four innovative components: query-intent parser agent, multi-query generation agent, coarse retrieval agent, fine-grained hallucination aware retrieval agent with a perplexity and NLL based hybrid scoring for the chunks, generation agent, a light-weight fact-verification agent and an orchestrator agent that manages a CoT reasoning debate among 3 agents to provide the final hallucination free response, grounded in factuality. The notion of Retrieval Augmented Generation in the context of Multimodal Medical LLMs has not been given due consideration from the lens of hallucination mitigation. Further, the existing approaches have been limited in their coverage of the medical domains, often limited to X-Ray. Medical Multimodal LLMs, when utilized for Multi-Modal Retrieval Augmented Generation, face critical challenges in maintaining factual accuracy while integrating complex visual and textual information. Our innovative approach addresses these challenges through a unified Triple Preference Optimization framework with three-stage preference dataset curation, focusing on cross-modal alignment, retrieval balance, and a dual staged visual feedback agent. Unlike existing solutions, our method employs a single-step optimization process that simultaneously handles multiple aspects of alignment while maintaining computational efficiency. Through careful curation of preference datasets that capture different levels of alignment quality, combined with a visual feedback agent for precise visual grounding to provide visual prompting for the Vision Language Model to improve its response, our approach significantly reduces hallucinations and improves medical response accuracy. Extensive evaluation across diverse medical domains, including radiology, ophthalmology, pathology, magnetic resonance imaging and CT scan demonstrates superior performance compared to the existing multimodal medical RAG methods, making our solution titled Align-MedRAG-VL, both practical and reliable for real-world medical applications where hallucination mitigation is paramount.en_US
dc.embargo.lift2030-12-01en_US
dc.embargo.terms2030-12-01en_US
dc.format.mimetypeapplication/pdfen_US
dc.identifier.urihttps://hdl.handle.net/1911/118242en_US
dc.language.isoenen_US
dc.subjectMedical RAGen_US
dc.subjectHallucination Mitigationen_US
dc.subjectMulti-Agent Orchestrationen_US
dc.subjectVisual Feedbacken_US
dc.subjectLLMsen_US
dc.subjectVLMsen_US
dc.subjectPreference Alignmenten_US
dc.titleReliable Medical LLM and Vision-Language RAG through Multi-Agent Orchestration and Single-Step Preference Alignmenten_US
dc.typeThesisen_US
dc.type.materialTexten_US
thesis.degree.departmentComputer Scienceen_US
thesis.degree.disciplineComputer Scienceen_US
thesis.degree.grantorRice Universityen_US
thesis.degree.levelMastersen_US
thesis.degree.nameMaster of Scienceen_US
Files
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.84 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.98 KB
Format:
Plain Text
Description: