The Rise of Vision-Language Foundation Models: Methods, Evaluation and Applications

Talk by Tiancheng ZHAO

Oct 13, 2023 Friday


In recent years, there has been a significant rise in the development and utilization of vision-language foundation models (VLMs). Notable work include CLIP, GPT-4 and etc. These models, which combine both visual and textual information, have revolutionized various domains such as computer vision, natural language processing, and multimodal learning. This presentation starts from some of the key methods in state-of-the-art vision-language models including model architecture and training techniques. Then we will cover the challenges of VLMs evaluation and show solutions that can provide fine-grained and interpretable model assessment, shining light on new research directions. Lastly, this presentation conclude with real-world applications of VLMs that can revolutionize AI application development, embodied IoT agents and many others.


Oct 13, 2023 Friday



RmE1-134, GZ Campus


628 334 1826 (PW: 234567)

Speaker Bio:

Tiancheng Zhao

Principal Researcher, Binjiang Institute, Zhejiang University

Tiancheng Zhao is a principal researcher at Binjiang Insititute of Zhejiang University and CTO of Linker Technology, a leading firm that builds multimodal intelligent agents. He received his Ph.D. in Computer Science from Carnegie Mellon University (CMU), advised by Prof. Maxine Eskenazi in 2019. He received his B.S in Electrical Engineering from University of California, Los Angeles (UCLA) with Summa Cum Laude in 2014. He has also received the Best & Brightest PhD Award at Microsoft Research in 2018. Dr. Zhao’s current research interests focus on large-scale pre-training in computer vision and natural language processing, including vision-language pre-training, foundation model evaluation, human-computer interaction. He has published more than 40 papers and has served as area chair in top journals and conferences including ACL, EMNLP, NAACL, AAAI, ECCV and etc.