news | Vineet Gandhi

Mar 2026	Two papers accepted at CVPR 2026 (one in main track and one in findings). Congratulations to Aishwarya for the amazing work. CCI presents striking visualization results for CLIP, while Lite-embed enables adapting CLIP to rare classes with just a few images, without modifying the model. Wonderful collaboration continues with Aishwarya and Srikrishna.
Mar 2026	Our paper called CLARIS: Clear and Intelligible Speech from Whispered and Dysarthric Voices accepted at CHI 2026. This breakthrough work shows that disarthric speech can be real-time converted into normal speech. Congratulations to Neil, Yash, and Shirish, with special mention to Yash on his first PhD paper. Speech samples available here:.
Sep 2025	Our paper on Simplifying Knowledge Transfer in Pretrained Models accepted at TMLR. Congratulations to Siddharth.
Jul 2025	Gave a talk titled The Sound Dimension: Speech and Audio in Multimodal AI at the CVIT workshop 2025. Slides (PDF): here
Jun 2025	Our paper titled “Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning” recieved best paper award at FGVC workshop at CVPR 2025. Congratulations to Darshana, the hard work has paid off.
Jun 2025	Many thanks to Adobe Research for extending the research gift for 2025, truly appreciate the continued support!
May 2025	Congratulations to Kawshik on successfully defending his thesis and completing his Dual Degree. Though NLP isn’t my core area, I ended up working on it with him, it was challenging but rewarding. Big thanks to Makarand and Shubham, for playing a crucial role in his thesis work. Best wishes to Kawshik for his next chapter at Google DeepMind.
May 2025	Our paper “NAM-to-Speech Conversion with Multitask-Enhanced Autoregressive Models” has been accepted at Interspeech 2025! Speech samples can be seen here
Feb 2025	Two full papers accepted at CVPR 2025. The first paper called TIDE, improves model generalization by localizing class-specific concepts and supports test-time correction. The second paper called VELOCITI, benchmarks video-language models on compositional understanding via a strict video-language entailment task tailored to modern VLMs. Try it on HuggingFace.
Jan 2025	Can LLMs untangle who’s who in complex stories? Our NAACL 2025 paper, IdentifyMe, puts them to the test with a new coreference benchmark!
Dec 2024	Four papers accepted at ICASSP 2025.