Planting a Seed of Vision in Large Language Model

Talk By Yixiao GE

Nov 03, 2023 Friday


With the goal of general artificial intelligence, we aim to develop an AI agent that is not only capable of multimodal multi-task capabilities but also exhibits emergent and self-evolving abilities in an open-world context. It is a long-term endeavor. We are taking the initial step in studying a foundational model that facilitates flexible input/output formats, transitioning and reasoning seamlessly between multimodal signals while acquiring knowledge from an inherently multimodal world. Starting from the visual modality, the underlying premise to accomplish this goal is to unify visual comprehension and generation tasks within an end-to-end framework. The talk will introduce our explorations this year to achieve the goal, from plugins to unification, from model-centric to data-centric.


Speaker Bio:

Dr. Yixiao GE

Senior Researcher, Affiliated with Tencent ARC Lab, Tencent AI Lab

Dr. Yixiao Ge is a senior researcher, affiliated with Tencent ARC Lab and Tencent AI Lab. She obtained her Ph.D. degree from Multimedia Lab (MMLab), at the Chinese University of Hong Kong, advised by Prof. Hongsheng Li and Prof. Xiaogang Wang in 2021. Her research interest includes computer vision and deep learning, with a focus on (1) multimodal foundation models for both generation and comprehension, (2) open-world visual representation and comprehension, and (3) efficient AI. She has published over 30 papers and served as a reviewer in top-tier conferences and journals, including CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, etc. She has won the 2022 SZCCF Science and Technology Award for pioneering practical technologies in both industry and academia.

