Forum Scheme: Emerging AI Trends in Image and Video Industry

Moderator: Dr. Ning Xu, Fellow of Advanced R&D at Adeia Inc, USA


  • Dr. Petros T. Boufounos, Deputy Director at Mitsubishi Electric Research Laboratories (MERL), USA
  • Dr. Mostafa El-Khamy, Senior Principal Engineer at Samsung SOC Research and Development, USA
  • Dr. Kai-Lung Hua, CTO of Microsoft Taiwan, Taiwan
  • Mr. Hui-Lam Ong, Principal Solution Architect at NEC Laboratories Singapore, Singapore

Dr. Ning Xu currently serves as Fellow, Advanced R&D at Adeia Inc, pioneering innovations that enhance the way we live, work, and play. Before joining Adeia, Dr. Xu was the Chief Scientist of Video Algorithms at Kuaishou Technology, and before that, he held various positions at Amazon, Snap Research, Dolby Laboratories, and Samsung Research America. He earned his Ph.D. in Electrical Engineering from the University of Illinois at Urbana-Champaign (UIUC) in 2005, and his Master’s and Bachelor’s degrees from the University of Science and Technology of China (USTC). Dr. Xu has co-authored over 200 journal articles, conference papers, patents, and patent applications. His research interests encompass machine learning, computer vision, video technology, and other related areas. He is a Senior Member of IEEE.

Petros T. Boufounos is a Distinguished Research Scientist and a Deputy Director at Mitsubishi Electric Research Laboratories (MERL), also leading the Computational Sensing Team. Dr. Boufounos completed his undergraduate and graduate studies at MIT. He received the S.B. degree in Economics in 2000, the S.B. and M.Eng. degrees in Electrical Engineering and Computer Science (EECS) in 2002, and the Sc.D. degree in EECS in 2006. Between September 2006 and December 2008, he was a postdoctoral associate with the Digital Signal Processing Group at Rice University. Dr. Boufounos joined MERL in January 2009, where he has been heading the Computational Sensing Team since 2016. Dr. Boufounos’ immediate research focus includes signal acquisition and processing, computational sensing, inverse problems, quantization, and data representations. He is also interested in how signal acquisition interacts with other fields that use sensing extensively, such as machine learning, robotics, and dynamical system theory.  He has over 40 patents granted and more than 10 pending, and more that 100 peer reviewed journal and conference publications in these topics. Dr. Boufounos has served as an Area Editor and a Senior Area Editor for the IEEE Signal Processing Letters, and as a member of the SigPort editorial board and the IEEE Signal Processing Society Theory and Methods technical committee. He is currently an Associate Editor at IEEE Transactions on Computational Imaging and the general co-chair of the ICASSP 2023 organizing committee. Dr. Boufounos is an IEEE Fellow and an IEEE SPS Distinguished Lecturer for 2019-2020.

Talk: Learning-based models in Sensing Applications

Abstract: Learning and data-driven approaches are becoming increasingly important in sensing and imaging applications. Learning-based models promise to capture characteristics of signals that are difficult to capture analytically and better describe physical processes, compared to analytical models. They are also versatile, as they can be used to refine analytical models and improve their modeling accuracy, reduce the computational complexity of using the model, or describe physical systems and processes for which no good analytical models exist. On the other hand, analytical models do not require training, which can be expensive both in computation and data requirements. In addition, analytical models are more amenable to theoretical analysis and may offer theoretical performance guarantees. This talk will explore how learning-based models enable a number of imaging applications and how combining them with analytical models can significantly reduce the training burden, improve performance, and help with theoretical analysis. We will show a range of approaches, either purely data-driven or combining analytical and learned models to various degrees. We will discuss the tradeoffs in each approach and the benefits and drawbacks, as related to their application in radar imaging, infrastructure monitoring and imaging of dynamical systems, among others.

Mostafa El-Khamy (Senior Member, IEEE) received the B.S. and M.S. degrees in electrical engineering from Alexandria University, Egypt, the M.S. and Ph.D. degrees in electrical engineering from the California Institute of Technology (Caltech), USA, and the M.B.A. degree from the Edinburgh Business School, U.K. He is currently a Senior Principal Engineer with Samsung SOC Research and Development, CA, USA. He is also an Adjunct Professor at the Faculty of Engineering, Alexandria University. He was a Founding Faculty Member of Egypt-Japan University for Science and Technology (E-JUST) and was at Qualcomm Research and Development, San Diego. His research interests include the theory and practice of artificial intelligence in multimedia and communication systems. He was a recipient of the URSI Young Scientist Award, the Caltech Atwood Fellowship, the Alexandria University Scientific Incentive Award, the Samsung Best Paper Award, and the Samsung Distinguished Inventor Award.

Talk: Recent trends for On-device AI-Camera

Abstract: This talk outlines recent trends for on-device AI. We discuss recent developments for AI-acceleration on mobile devices and how they enable AI-camera. We discuss solutions to enable content aware camera. We dive deeper into AI-methods behind the success of AI-camera, such as scene understanding and accurate scene segmentation for content-aware cameras.

Kai-Lung Hua obtained his B.S. degree in electrical engineering from National Tsing Hua University in 2000 and later achieved an M.S. degree in communication engineering from National Chiao Tung University in 2002, both situated in Hsinchu, Taiwan. He completed his Ph.D. at the School of Electrical and Computer Engineering at Purdue University in West Lafayette, IN, USA in 2010. Starting from 2010, Dr. Hua has been affiliated with the National Taiwan University of Science and Technology. Throughout this time, he has held a range of significant positions, which include being a professor in the CSIE department, serving as the director of the AI research center, taking on the role of vice dean for the EECS College, and leading as the dean of the industry-academia collaboration office. From 2022 onwards, he transitioned to Microsoft Taiwan, where he has taken on the pivotal role of Chief Technology Officer. In this role, he leads the effort to empower Taiwan’s industry by harnessing the potential of cloud and AI capabilities to facilitate profound transformative change. Dr. Hua is a distinguished member of Eta Kappa Nu and Phi Tau Phi, and he has been honored as a recipient of the MediaTek Doctoral Fellowship. His research pursuits encompass digital image and video processing, computer vision, and machine learning. His accomplishments include a series of esteemed research awards, notably the 2022 K. T. Li Cornerstone Award from the Institute of Information & Computing Machinery, the 2020 Outstanding Research Award from the National Taiwan University of Science and Technology, the Top Performance Award from the 2017 ACM Multimedia Grand Challenges, the Top 10% Paper Award from the 2015 IEEE International Workshop on Multimedia Signal Processing, and the Second Prize Award from the 2014 ACM Multimedia Grand Challenge, among others.

Talk: Robust Face Anti-Spoofing in Unseen Domains via Geometry-Aware Networks

Abstract: Effective face anti-spoofing (FAS) is vital for a robust face recognition system. Despite the development of numerous texture-based countermeasures to thwart presentation attacks (PAs), their performance against unseen domains or novel spoofing methods remains unsatisfactory. Rather than attempting to comprehensively catalog all potential spoofing variations and rendering binary live/spoof determinations, we present a novel approach to the FAS challenge. Our approach focuses on discerning between normal and abnormal movements within live and spoof presentations. Introducing the Geometry-Aware Interaction Network (GAIN), we harness the power of dense facial landmarks by utilizing a spatio-temporal graph convolutional network (ST-GCN). This technique not only yields a more interpretable and modularized FAS model but also incorporates a cross-attention feature interaction mechanism. This mechanism seamlessly integrates with existing methods, resulting in a notable performance enhancement. Empirical evaluations underscore our approach’s state-of-the-art performance across both standard intra-dataset assessments and cross-dataset evaluations. Particularly impressive, our model exhibits a significant performance margin over state-of-the-art methods in the cross-dataset cross-type protocol, as demonstrated on the CASIA-SURF 3DMask DATABASE. This accomplishment highlights our model’s robustness against domain shifts and previously unseen forms of spoofing.

Hui Lam Ong specializes in high-performance video analytic solutions. His decades of professional experience in cyber security and software architecture design enables him to understand the complexity of video analytics deployment challenges. His current focus aims to reduce manpower and maintenance cost of large-scale video analytics solutions.

Talk: Bridging the Gaps for the Adoption of Large-Scale Video Analytics Solutions

Abstract: In recent years, city governments worldwide have recognized the benefits of implementing video analytics to enhance and maintain safety and security. Consequently, an increasing number of public areas, such as housing estates, multi-story car parks, and high-traffic transit locations like train stations and bus interchanges, are being equipped with surveillance cameras. The initial deployment process requires a significant financial investment to upgrade the necessary infrastructure, procure high-resolution surveillance cameras, and obtain modern hardware and software for data storage and analysis. Moreover, the employment of highly skilled IT professionals is essential for installing, managing, troubleshooting, and maintaining these systems. This presents a substantial resource challenge for city governments. It’s also crucial not to forget the public’s privacy concerns. The speaker for this session possesses extensive experience in addressing these challenges. He and his team will provide a comprehensive exploration of these industry-wide issues. Their mission to streamline the process of deploying video analytics solutions has led them to develop a semi-automated, AI-aided system. This advanced system aims to alleviate these challenges by ensuring the protection of individual privacy and simplifying the implementation of these solutions.