Publications

Selected publications in reversed chronological order. List of all publications @Google Scholar

2025

  1. interspeech_2025_nam.jpg
    NAM-to-Speech Conversion with Multitask-Enhanced Autoregressive Models
    Neil Shah, Shirish Karande, and Vineet Gandhi
    In Interspeech, Aug 2025
  2. tide-cvpr2025.jpg
    TIDE: Training Locally Interpretable Domain Generalization Models Enables Test-time Correction
    Aishwarya Agarwal, Srikrishna Karanam, and Vineet Gandhi
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2025
    Note: presented as Highlight
  3. cvpr2025-velociti.webp
    VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
    Darshana Saravanan, Varun Gupta, Darshan Singh, Zeeshan Khan, Vineet Gandhi, and Makarand Tapaswi
    In Conference on Computer Vision and Pattern Recognition (CVPR), Jun 2025
  4. miv_cvpr_2025.webp
    Investigating Mechanisms for In-Context Vision Language Binding
    Darshana Saravanan, Makarand Tapaswi, and Vineet Gandhi
    In CVPR Workshop on Mechanistic Interpretability in Vision (MIV), Jun 2025
  5. Best paper
    fgcv.jpg
    Pseudo-labelling meets Label Smoothing for Noisy Partial Label Learning
    Darshana Saravanan, Naresh Manwani, and Vineet Gandhi
    In CVPR Workshop on Workshop on Fine-Grained Visual Categorization (FGVC), Jun 2025
  6. naacl2025-identifyme-480.webp
    IdentifyMe: A Challenging Mention Resolution Benchmark for LLMs
    Kawshik Manikantan, Makarand Tapaswi, Vineet Gandhi, and Shubham Toshniwal
    In Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), May 2025
    Note: Short paper
  7. mri.jpg
    MRI2Speech:Speech Synthesis from Articulatory Movements Recorded by Real-time MRI
    Neil Shah, Ayan Kashyap, Shirish Karande, and Vineet Gandhi
    In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr 2025
  8. multinam.jpg
    Advancing NAM-to-Speech Conversion with Novel Methods and the MultiNAM Dataset
    Neil Shah, Shirish Karande, and Vineet Gandhi
    In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr 2025
  9. p2c.jpg
    Prompt-to-Correct: Automated Test-Time Pronunciation Correction with Voice Prompts
    Ayan Kashyap, Neil Shah, and Vineet Gandhi
    In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr 2025
  10. sal_icassp_2025.jpg
    Minimalistic Video Saliency Prediction via Efficient Decoder & Spatio Temporal Action Cues
    Rohit Girmaji, Siddharth Jain, Bhav Beri, Sarthak Bansal, and Vineet Gandhi
    In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr 2025
  11. iui.jpg
    EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues
    Rohit Girmaji, Bhav Beri, Ramanathan Subramanian, and Vineet Gandhi
    In Intelligent User interfaces (IUI), Mar 2025

2024

  1. stethospeech.jpg
    StethoSpeech: Speech Generation Through a Clinical Stethoscope Attached to the Skin
    Neil Shah, Neha Sahipjohn, Vishal Tambrahalli, Ramanathan Subramanian, and Vineet Gandhi
    UBICOMP 2024, to appear in ACM Interactive, Mobile, Wearable and Ubiquitous Technologies, Nov 2024
  2. mei.jpg
    Major Entity Identification: A Generalizable Alternative to Coreference Resolution
    Kawshik Manikantan, Shubham Toshniwal, Tapaswi Makarand, and Vineet Gandhi
    In Empirical Methods in Natural Language Processing (EMNLP), Nov 2024
  3. parrot_tts.jpg
    ParrotTTS: Text-to-speech synthesis exploiting disentangled self-supervised representations
    Neil Shah, Saiteja Kosgi, Vishal Tambrahalli, Sahipjohn, Anil Kumar Nelakanti, and Vineet Gandhi
    In Findings of the Association for Computational Linguistics (EACL), Nov 2024

2023

  1. hie.jpg
    Test-Time Amendment with a Coarse Classifier for Fine-Grained Classification
    Kanishk Jain, Shyamgopal Karthik, and Vineet Gandhi
    In Conference on Neural Information Processing Systems (Neurips), Nov 2023
  2. icra_2023.jpg
    Ground then Navigate: Language-guided Navigation in Dynamic Scenes
    Kanishk Jain, Varun Chhangani, Amogh Tiwari, Madhava Krishna, and Vineet Gandhi
    In International Conference on Robotics and Automation (ICRA), Nov 2023
  3. mvdet.jpg
    Bringing Generalization to Deep Multi-view Detection
    Jeet Vora, Swetanjal Dutta, Shyamgopal Karthik, and Vineet Gandhi
    In Winter Conference on Applications of Computer Vision Workshops (WACV-W), Nov 2023

2022

  1. icmi_2022.jpg
    Does Audio help in deep Audio-Visual Saliency prediction models?
    Ritvik Agrawal, Shreyank Jyoti, Rohit Girmaji, Sarath Sivaprasad Sivaprasad, and Vineet Gandhi
    In International Conference on Multimodal Interaction (ICMI), Nov 2022
    Note: Best student paper award
  2. naacl_22.jpg
    Empathic Machines: Using Intermediate Features as Levers to Emulate Emotions in Text-To-Speech Systems
    In North American Chapter of the Association for Computational Linguistics (NAACL), Jul 2022
  3. grounding.jpg
    Comprehensive Multi-Modal Interactions for Referring Image Segmentation
    Kanishk Jain and Vineet Gandhi
    In Findings of Association for Computational Linguistics (ACL), May 2022
  4. dg.jpg
    Reappraising Domain Generalization in Neural Networks
    Sarath Sivaprasad, Akshay Goindani, Vaibhav Garg, and Vineet Gandhi
    In arXiv:2110.07981, May 2022

2021

  1. rnr.jpg
    Grounding Linguistic Commands to Navigable Regions
    Nivedita Rufus, Kanishk Jain, Unni Krishnan R Nair, Vineet Gandhi, and K Madhava Krishna
    In International Conference on Intelligent Robots and Systems (IROS), May 2021
  2. vsaliency.jpg
    ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
    Samyak Jain, Pradeep Yarlagadda, Shreyank Jyoti, Shyamgopal Karthik, Ramanathan Subramanian, and Vineet Gandhi
    In International Conference on Intelligent Robots and Systems (IROS), May 2021
  3. interspeech.jpg
    Emotional Prosody Control for Speech Generation
    Sarath Sivaprasad, Saiteja Kosgi, and Vineet Gandhi
    In Interspeech, May 2021
  4. convex.jpg
    The Curious Case of Convex Networks
    Sarath Sivaprasad, Naresh Manwani, and Vineet Gandhi
    In European Conference on Machine Learning (ECML), May 2021
  5. iclr.jpg
    No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks
    Shyamgopal Karthik, Ameya Prabhu, Puneet Dokania, and Vineet Gandhi
    In International Conference on Learning Representations (ICLR), May 2021

2020

  1. tracking.jpg
    Simple Unsupervised Multi-Object Tracking
    Shyamgopal Karthik, Ameya Prabhu, and Vineet Gandhi
    In arXiv:2006.02609, May 2020
  2. smallobs.jpg
    LiDAR guided Small obstacle Segmentation
    Aasheesh Singh, Aditya Kamireddypalli, Vineet Gandhi, and K Madhava Krishna
    In International Conference on Intelligent Robots and Systems (IROS), May 2020
  3. saliency.jpg
  4. chi.jpg
    GAZED - Gaze-guided Cinematic Editing of Wide-Angle Monocular Video Recordings
    Bhanu K L Moorthy, Moneish Kumar, Ramanathan Subramanian, and Vineet Gandhi
    In Conference on Human Factors in Computing Systems (CHI), May 2020
  5. 3rs.jpg
    Exploring 3 R’s of Long-term Tracking: Re-detection, Recovery and Reliability
    Shyamgopal Karthik, Abhinav Moudgil, and Vineet Gandhi
    In Winter Conference on Applications of Computer Vision (WACV), May 2020

2019

  1. cine.png
    CineFilter: Unsupervised Filtering for Real Time Autonomous Camera Systems
    Sudheer Achary, Javed Syed Ashar, Nikita Shravan, Moorthy K L Bhanu, Vineet Gandhi, and Anoop Namboodiri
    In Workshop on Intelligent Cinematography and Editing (WICED), Eurographics 2020, May 2019
  2. car.jpg
    Talk to the Vehicle: Language Conditioned Autonomous Navigation of Self Driving Cars
    Sriram N. N., Tirth Maniar, Jayaganesh Kalyanasundaram, Vineet Gandhi, Brojeshwar Bhowmick, and Madhava K. Krishna
    In International Conference on Intelligent Robots and Systems (IROS), May 2019
  3. umbrella.jpg
    Learning Unsupervised Visual Grounding Through Semantic Self-Supervision
    Syed Ashar Javed, Saxena Shreyas, and Vineet Gandhi
    In International Joint Conference on Artificial Intelligence (IJCAI), May 2019

2018

  1. headpose.png
    Nose, Eyes and Ears: Head Pose Estimation By Locating Facial Keypoints
    Aryaman Gupta, Kalpit Thakkar, Vineet Gandhi, and P J Narayanan
    In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2018
  2. Bike.jpg
    Long-Term Visual Object Tracking Benchmark
    Moudgil Abhinav and Gandhi Vineet
    In Asian Conference on Computer Vision (ACCV), May 2018
  3. icra_18.jpg
    MergeNet: A Deep Net Architecture for Small Obstacle Discovery
    Krishnam Gupta, Syed Ashar Javed, Vineet Gandhi, and Madhava K. Krishna
    In International Conference on Robotics and Automation (ICRA), May 2018
  4. eg_18.jpg
    Watch to Edit: Video Retargeting using Gaze
    Kranthi Kumar, Moneish Kumar, Vineet Gandhi, and Ramanathan Subramanian
    In Computer Graphics Forum (Eurographics edition), May 2018
  5. qual.jpg
    Document Quality Estimation using Spatial Frequency Response
    Pranjal Kumar Rai, Sajal Maheshwari, and Vineet Gandhi
    In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2018