Building the Singing Voice Foundation Model (AIAA6101)

Talk By Wei XUE

Feb 01, 2024 Thursday

Abstract:

We built a large singing voice foundation model to achieve cross-gender, language, vocal range, zero resource, and rapid generation of singing voice synthesis. Unlike traditional AI singers that require hours of training data and fixed repertoire, this model can support lyrics and melody modifications. It can achieve the effect of singing any new song using only tens of seconds of data, achieving song synthesis rather than simple conversion. This talk will introduce a series of supporting technologies, including timbre synthesizer based on NAS-FM, CoMoSpeech, CoMoSVC, etc.

Time:

Feb 01, 2024 Thursday

11:00-11:50

Location:

Rm W1-101, GZ Campus

Online Zoom

Join Zoom athttps://hkust-gz-edu-cn.zoom.us/j/4236852791 OR 423 685 2791

Speaker Bio:

Dr. Wei Xue

Assistant Professor

Division of Emerging Interdisciplinary Areas (EMIA), Hong Kong University of Science and Technology

Wei Xue is currently an Assistant Professor at Division of Emerging Interdisciplinary Areas (EMIA), Hong Kong University of Science and Technology (HKUST). He received the Bachelor degree in automatic control from Huazhong University of Science and Technology in 2010, and the Ph.D. degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences in 2015. From August 2015 to September 2018 he was first a Marie Curie Experienced Researcher and then a Research Associate in Speech and Audio Processing Group, Department of Electrical & Electronic Engineering, Imperial College London, UK. He was a Senior Research Scientist at JD AI Research, Beijing, from November 2018 to December 2021, where he was leading the R&D on front-end speech processing and acoustic modelling for robust speech recognition. From January 2022 to April 2023 he was an Assistant Professor at Department of Computer Sciences, Hong Kong Baptist University. He was a visiting scholar at Université de Toulon and KU Leuven. Wei’s research interests are in speech and music intelligence, including AI music generation, speech enhancement and separation, room acoustics, as well as speech and audio event recognition. He was a former Marie Curie Fellow and was selected into the Beijing Overseas Talent Aggregation Project. He currently leads the AI music research in the theme-based Art-Tech project which totally received HK$52.8 million from Hong Kong RGC.