Automatic Captioning of Open Visual World

Talk by Qin JIN

Dec 08, 2023 Friday


Automatically generating natural language descriptions of the visual content has been one of the representative tasks that bridge computer vision and natural language processing. In recent years, it has attracted extensive research attention in the vision and language communities. In this talk, I will first present our recent works on solving the “overly generic” issue of generated captions. Specifically, we propose an annotation-free, light-weight and plug-and-play framework to generate more descriptive captions based on vision-language pre-trained models. Furthermore, we attempt to solve the overly generic problem from the root and propose a new learning goal, semipermeable maximum likelihood estimation. to optimize the model to generate more detailed captions. Lastly, I will introduce our newly proposed task, embodied visual captioning in 3D environment, which aims to overcome the previous limitations of visual captioning systems relying on given high-quality images. It will enable intelligent agents that can autonomously navigate in 3D environments and actively acquire visual information to generate captions.


Speaker Bio:

Dr. Qin Jin

Full Professor, School of Information, Renmin University of China

Dr. Qin Jin is a full professor in School of Information at Renmin University of China, leading the AI·M3 lab. She received her B.E. and M.E. degrees from Tsinghua Univeristy, and her Ph.D. degree from Carnegie Mellon University. Her main research interests include intelligent multimedia computing and human computer interaction. Her works on video understanding and multimodal affective analysis have won various awards in international challenge evaluations, including TRECVID VTT, ActivityNet Dense Video Captioning, Audio-Visual Emotion Challenge (AVEC), etc. She received the best Grand Challenge Paper Award in ACM Multimedia 2017 and best paper nomination in ICMR 2018. She served as the Technical Program Chair of ACM Multimedia 2022.  She currently serves as an Associate Editor of ACM Transactions on Multimedia Computing, Communications, and Applications.