| Two papers accepted at CVPR 2026 (one in main track and one in findings). Congratulations to Aishwarya for the amazing work. CCI presents striking visualization results for CLIP, while Lite-embed enables adapting CLIP to rare classes with just a few images, without modifying the model. Wonderful collaboration continues with Aishwarya and Srikrishna. |
| Our paper called CLARIS: Clear and Intelligible Speech from Whispered and Dysarthric Voices accepted at CHI 2026. This breakthrough work shows that disarthric speech can be real-time converted into normal speech. Congratulations to Neil, Yash, and Shirish, with special mention to Yash on his first PhD paper. Speech samples available here:. |
| Our paper on Simplifying Knowledge Transfer in Pretrained Models accepted at TMLR. Congratulations to Siddharth. |
| Gave a talk titled The Sound Dimension: Speech and Audio in Multimodal AI at the CVIT workshop 2025. Slides (PDF): here |
| Our paper titled “Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning” recieved best paper award at FGVC workshop at CVPR 2025. Congratulations to Darshana, the hard work has paid off. |
| Many thanks to Adobe Research for extending the research gift for 2025, truly appreciate the continued support! |
| Congratulations to Kawshik on successfully defending his thesis and completing his Dual Degree. Though NLP isn’t my core area, I ended up working on it with him, it was challenging but rewarding. Big thanks to Makarand and Shubham, for playing a crucial role in his thesis work. Best wishes to Kawshik for his next chapter at Google DeepMind. |
| Our paper “NAM-to-Speech Conversion with Multitask-Enhanced Autoregressive Models” has been accepted at Interspeech 2025! Speech samples can be seen here |
| Two full papers accepted at CVPR 2025. The first paper called TIDE, improves model generalization by localizing class-specific concepts and supports test-time correction. The second paper called VELOCITI, benchmarks video-language models on compositional understanding via a strict video-language entailment task tailored to modern VLMs. Try it on HuggingFace. |
| Can LLMs untangle who’s who in complex stories? Our NAACL 2025 paper, IdentifyMe, puts them to the test with a new coreference benchmark! |
| Four papers accepted at ICASSP 2025. |