Cross-Modality Alignment for Visual Content Understanding and Generation

Talk By Guanbin Li

Apr 26, 2024 Friday


The rapid development of single-modal content understanding such as vision and language has put forward higher requirements for cross-modal learning technologies such as cross-modal information retrieval, content generation, and human-computer intelligent interaction. Cross-modal representation and generation are two core basic issues in cross-modal learning. Cross-modal representation aims to achieve enhanced expression of features by learning to align semantics between multiple modalities, while cross-modal generation is based on the semantic consistency between modalities to achieve mutual conversion of different modal data in form. Among them, cross-modal explicit semantic alignment is the core of realizing fine-grained, parsable cross-modal understanding technology.In this talk, I will introduce our research attempts in the area of cross-modal semantic alignment from perspectives including graph network information propagation, multimodal large model distillation, knowledge embedding, and structural consistency representation. Additionally, I will present the application and validation of these technologies in fields such as cross-modal visual target localization, cross-modal medical information processing, and digital human video generation.


Apr 26, 2024 Friday



Rm W1-101, GZ Campus

Online Zoom

Join Zoom at OR 423 685 2791

Prof. Guanbin Li

Guanbin Li is an Associate Professor and PhD supervisor at the School of Computer Science, Sun Yat-sen University, and a recipient of the National Science Fund for Distinguished Young Scholars. His main research areas include cross-modal visual perception, understanding, and generation. To date, he has published over 140 papers in CCF Class A/Chinese Academy of Sciences Zone 1 journals, which have been cited more than 11,000 times on Google Scholar. He has received numerous awards, including the Wu Wenjun AI Excellent Youth Award, ACM China Rising Star Nomination, the First Prize of Science and Technology by the China Society of Image and Graphics, Best Paper Nomination at ICCV2019, and Best Poster Award at ICMR2021. Prof Li serves as an Area Chair (AC) or Senior Program Committee member for top-tier conferences such as CVPR, ECCV, and AAAI. Additionally, he is an editorial board member of The Visual Computer and has won eight championships in prestigious international competitions, including CVPR, NeurIPS, and ACM MM.