Vineet Gandhi

F24, CVIT,

KCIS Research Block,

IIIT Hyderabad, Gachibowli

Hyderabad, India, 500032

I am currently an associate professor at IIIT Hyderabad, where I am affiliated with Center for Visual Information Techonology ( CVIT ). I also advise a beautiful animation startup on their AI and ML related efforts Animaker.com. I completed my PhD degree at INRIA Rhone Alpes/Univesity of Grenoble in applied mathematics and computer science (mathématique appliquée et informatique) under the guidance of Remi Ronfard. I was funded by the CIBLE scholarship by Region Rhone Alpes. Prior to this, I completed my Masters with Erasmus Mundus scholarship under CIMET consortium. I am extremely thankful to European Union for giving me this opportunity which had a huge impact on both my professional and personal life. I spent a semester each in Spain, Norway and France and later joined INRIA for my master thesis and continued for my PhD there. I was also lucky to travel and deeply explore Europe (from south of Spain to North of Norway), at times purely surviving on gestural expressions for communication. I obtained my Bachelor of Technology degree from Indian Institute of Information Technology, Design and Manufacturing (IIITDM) Jabalpur, India (I belong to the first batch of the insititute).

I like to focus on problems with tangible goals and I try to build end to end solutions (with neat engineering). My current research interests are in applied machine learning for applications in computer vision and multimedia. In recent years, I have been exploring specific problems of model generalization in ML, text-to-speech, speech from varying signals for accessibility, vision language models, automated cinematography/editing etc. In personal space, I like to spend time with my family, play cards, and read ancient literature.

News [ archives ]

May 2026	I will be serving as an Area Chair for NeurIPS 2026.
Mar 2026	Two papers accepted at CVPR 2026 (one in main track and one in findings). Congratulations to Aishwarya for the amazing work. CCI presents striking visualization results for CLIP, while Lite-embed enables adapting CLIP to rare classes with just a few images, without modifying the model. Wonderful collaboration continues with Aishwarya and Srikrishna.
Mar 2026	Our paper called CLARIS: Clear and Intelligible Speech from Whispered and Dysarthric Voices accepted at CHI 2026. This breakthrough work shows that disarthric speech can be real-time converted into normal speech. Congratulations to Neil, Yash, and Shirish, with special mention to Yash on his first PhD paper. Speech samples available here:.
Sep 2025	Our paper on Simplifying Knowledge Transfer in Pretrained Models accepted at TMLR. Congratulations to Siddharth.
Jul 2025	Gave a talk titled The Sound Dimension: Speech and Audio in Multimodal AI at the CVIT workshop 2025. Slides (PDF): here