Led research in Multimodal Large Language Models (LLMs) for medical imaging, developing foundational medical multimodal datasets (Quilt-1M, MedNarratives) with over 1 million image-text pairs.
Engineered state-of-the-art multimodal AI models (QuiltNet, Quilt-LLaVA) for comprehensive medical image analysis, achieving superior performance.
Pioneered a multi-agent AI framework, PathFinder, for clinical diagnosis, outperforming human experts in diagnostic accuracy.
Developed robust benchmarks, MedBlink, to evaluate the performance of multimodal medical AI models.
Leading initiatives to enhance the 'physics' in image and video generative models, focusing on Newtonian physics principles to generate large-scale video scene graph datasets and pipelines.