DETR and Beyond: Advancing Open-Set Object Detection with Language and Visual Prompts

Talk By Lei ZHANG

Mar 1, 2024 Friday


This talk presents the evolution of object detection from the traditional closed-set frameworks to the open-set domain, leveraging the DETR algorithm as our foundational technology. We first review the DETR model, setting the stage for our DETR-series research: DAB-DETR, DN-DETR, and DINO, which establish DETR-like models as a mainstream detection framework. We then discuss our steps towards open-set object detection, introducing Grounding DINO, which utilizes language prompts to extend detection boundaries, and T-Rex, which employs visual prompts for object detection in unseen domains. These works mark significant steps toward object detection systems that are not limited by predefined labels but are instead equipped to grow and adapt through interaction with prompts. This talk will explore these advancements, demonstrating their potential to redefine the landscape of object detection.


Mar 1, 2024 Friday



Rm W1-101, GZ Campus

Online Zoom

Join Zoom at OR 423 685 2791

Speaker Bio:


Chair Scientist of Computer Vision and Robotics,

International Digital Economy Academy(IDEA), Shenzhen

Adjunct Professor, HKUST(GZ)

Lei Zhang is the Chair Scientist of Computer Vision and Robotics at International Digital Economy Academy (IDEA) and an Adjunct Professor of Hong Kong University of Science and Technology (Guangzhou). Prior to this, he was a Principal Researcher and Research Manager at Microsoft, where he has worked since 2001 in MSRA (Beijing), MSR (Redmond), and other computer vision-related product teams. His research interests are mainly in computer vision and machine learning, with a particular focus on generic visual recognition at large scale. Since joining IDEA Research in 2021, he has conducted a series of research works in the direction of object detection, among which the DINO algorithm for the first time established DETR-like algorithms as the SOTA of object detection. He has published more than 150 papers in top conferences and journals and holds more than 60 US-granted patents. He was named as IEEE Fellow for his contribution in large-scale visual recognition and multimedia information retrieval.