March 28, 2023 Tuesday
Abstract:
Many recent AI breakthroughs are based on a large amount of high-quality data. For example, ChatGPT adopts a similar model architecture to GPT-3, but improves it with fine-tuned high-quality data. Most ML methods are model-centric, meaning the model is iteratively enhanced by designing more elegant architectures or objective functions to improve predictive performance. Despite the success of model-centric ML, the power of data is overlooked in the AI and ML community.
Unstructured graph data leads to more challenges for data-centric ML. In this talk, I will introduce our recent progress in data-centric ML (DCML) on graphs. Specifically, I will focus on the following two challenges: 1) for data quantity, how to efficiently annotate a large amount of data for graph ML? 2) for data quality, besides more complex models, how to make graph ML faster and easier from the data perspective? Finally, I will briefly introduce the deployment of our work and its influence in the industry.
Speaker Bio:
Dr. Wentao ZHANG
Postdoc Research Fellow, Montreal Institute for Learning Algorithms (Mila)
Ph.D. degree in computer science, Peking University
Dr. Wentao Zhang is currently a postdoc research fellow working at Montreal Institute for Learning Algorithms (Mila), and he received his Ph.D. degree in computer science from Peking University. Besides, he has accumulated four years of industrial experience in Tencent and Apple.
Wentao’s research focuses on data-centric ML, graph ML, and ML systems. He has published 30+ papers, including 10+ first-author papers in the top DB (SIGMOD, VLDB, ICDE), DM (KDD, WWW), and ML (ICML, NeurIPS, ICLR) venues. Besides, he is the contributor or designer of several system projects, including Angel , SGL , MindWare , and OpenBox. His research works have been powering several billion-scale applications in industry, and some of them have been recognized by multiple prestigious awards, including the Best Student Paper Award at WWW’22.