Cross-modal Pretraining for Open-set Detection and Segmentation

Talk by Xiaodan LIANG

Sep 15, 2023 Friday


Considering the ubiquity of new concepts in real-world scenes, the predictive ability of new concepts in open-world scenarios has very important research value. This talk presents a series of our recent works on open-world detection and segmentation via cross-modal fine-grained pretraining. First, I will present DetCLIPv2, an end-to-end open-vocabulary detection pre-training framework that effectively incorporates large-scale image-text pairs.  However, a pre-defined category space is still required during the inference stage and only the objects belonging to that space will be predicted.  We then propose CapDet to either predict under a given category list or directly generate the category of predicted bounding boxes. Besides, semantic segmentation models trained with image-level text supervision have shown promising results in challenging open-world scenarios. However, these models still face difficulties in learning fine-grained semantic alignment at the pixel level and predicting accurate object masks. To address this issue, we propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation that enhances a model’s ability to reorganize patches mixed across images, exploring both local visual relevance and global semantic coherence by pretraining. Finally the future of how to deploy these works into cross-modal embodied AI is discussed.


September 15th, 2023, Friday

11:00 – 11:50


Rm134, E1


628 334 1826 (PW: 234567)

Bilibili Live:

ID: 30748067

Speaker Bio:

Prof. Xiaodan LIANG

Associate Professor, Sun Yat-sen University

Xiaodan Liang is currently an Associate Professor at Sun Yat-sen University, Co-director of Human-Cyber-Physical intelligent integration Lab,

She was a Project Scientist at Carnegie Mellon University, working with Prof. Eric Xing. She focuses on interpretable and cognitive intelligence and its applications on large-scale visual recognition, cross-modal analysis and understanding and digital human analysis. She has published over 100 cutting-edge papers which have appeared in the most prestigious journals (e.g., TPAMI) and conferences (e.g., CVPR /ICCV /ECCV /Neurips) in the field, Google Citation 20000+. She serves as an Area Chair of ICCV 2019, WACV 2020, Neurips 2021-2023, ICLR 2024, CVPR 2020 and Tutorial Chair (Organization committee) of CVPR 2021 and Ombud Committee of CVPR 2023. She also serves as the Associate Editor of Neural Network Journal (Impact Factor >8). She has been awarded the ACM China (only 2 in China) and CCF Best Doctoral Dissertation Award, the Alibaba DAMO Academy Young Fellow (Top10 under 35 in China), and the ACL 2019 Best Demo paper nomination. She is named one of the young innovators 30 under 30 by Forbes (China). She is a senior member of IEEE.