OCR Foundation Models: Current Status and Future Prospects 关于OCR大模型的一些思考

Talk By Lianwen JIN

Mar 14, 2024 Thursday

Abstract:

The emergence of large language models (LLMs) has led to significant advancements in the field of Artificial General Intelligence (AGI) particularly for natural language processing. In the past few years, multimodal large models have gained considerable attention and have also undergone rapid development. However, the focus on large models specifically tailored to Optical Character Recognition (OCR) domains has received relatively limited attention in the research community. In this talk, I will provide a brief overview of relevant technologies of multimodal large models and large-scale pre-trained models for OCR in recent years. I will delve into the foundational model construction methods for OCR and introduce some representative approaches. Moreover, I will discuss and provide insight into the development trends and future research directions in OCR field in the era of large models. 

随着大语言模型的兴起,面向自然语言处理领域的通用人工智能(AGI)取得了重大突破,近年来,多模态大模型也引起了广泛的研究关注并取得了快速发展,但目前针对光学文字识别(OCR)垂直领域的大模型研究工作报道还不多。本报告将简要回顾近年来多模态大模型和OCR大规模预训练模型等相关技术,探讨面向OCR的基础模型构建方法和可能的技术路线,并对大模型时代OCR技术发展趋势与未来研究方向进行讨论和展望。

Time:

Mar 14, 2024 Thursday

11:00-11:50

Location:

Rm W1-101, GZ Campus

Online Zoom

Join Zoom athttps://hkust-gz-edu-cn.zoom.us/j/4236852791 OR 423 685 2791

Speaker Bio:

Lianwen JIN

Professor, South China University of Technology,

Executive Member of the Chinese Society of Image and Graphics (CSIG)

Lianwen Jin is a professor at South China University of Technology. He holds several notable positions including Executive Member of the Chinese Society of Image and Graphics (CSIG), Chairman of the Guangdong Provincial Society of Image and Graphics, and Director of the Special Committee on Document Image Analysis and Recognition of CSIG. His primary research areas include optical character recognition, document image understanding, and computer vision. Prof. Jin has published over 200 papers in important academic journals and international conferences. His papers have been cited over 14,000 times on Google Scholar, with an H-Index of 61. He has been selected as one of the “Top 2% Scientists Worldwide by Stanford University” for the past 4 years. He has received five scientific and technological innovation awards from Guangdong province or the Minister of Education of China. He has guided students to win the first place in more than 20 academic competitions in international and domestic conferences, such as CVPR, ICDAR, ICPR, ICFHR, PRCV, etc.

金连文,男,华南理工大学二级教授,兼任中国图象图形学学会(CSIG)常务理事、广东省图象图形学会理事长、CSIG文档图像分析与识别专委会主任等职。主要研究领域为文字识别、文档图像理解、计算机视觉等,在重要学术期刊及国际会议上发表论文200余篇,其中SCI Q1区+CCF A类论文100余篇,Google Scholar论文被引用数14000余次,H-Index 61。近4年连续入选“全球前2%顶尖科学家”年度榜单。获省部级科技奖5项(其中一等奖2项,二等奖3项);荣获全国性行业学会科技进步二等奖3项;指导学生参加CVPR、ICDAR、ICPR、ICFHR、PRCV等国际国内知名会议上的学术竞赛并荣获冠军20余次。