NEC Seminar (organized by NEC Corporation, Japan)

Seminar Title: Exploring NEC’s Recognition AIs: Biometrics, Fairness, and Behavior Analysis
Seminar Abstract: Unveiling the latest advancements in AI recognition technology, our seminar will delve into real-time multi-object tracking for biometric solutions, fair face attribute recognition through Imbalance-Aware Adaptive Margin, annotation-free image recognition AIs in retail, and harnessing the power of retrieval for human behavior understanding in video analysis. Join us to explore groundbreaking research and the practical applications of these transformative technologies in shaping the digital society of the future.

Title: Toward Real-time End-to-End Multi-Object Tracking Model for Biometric Solution
Speaker: Dr. Hiroshi Fukui, NEC Corporation, Japan
Abstract: This session presents a new real-time multi-object tracking (MOT) model that is an important technology for computer vision, such as autonomous driving, video analysis, and biometrics systems. Recently, NEC has developed a technology called gateless access control system using biometric recognition, which is capable of authenticating 100 persons in one minute without any devices or gates by tracking everyone who passes through. The MOT process is a core component of this system, as it tracks each authenticator without missing. Although the deep-based MOT models have been proposed and resulted in improved accuracy, these models significantly slow down when including over 50 persons due to multi-step modules, such as deep- and rule-based methods. In contrast, our proposed MOT model, called TicrossNet, is a simple network design composed of a base detector and a cross-attention module only. TicrossNet, which runs all MOT processes by GPU in an end-to-end manner, achieves 31.0 FPS even if including as many as >100 persons per frame. As a result, the gateless system achieves a real-time authentication of a large number of pers.

Biosketch: Hiroshi Fukui is an assistant manager (senior researcher) of Biometrics Research Laboratories, NEC Corporation, Japan. He received B.E., M.E., and Ph.D. degrees from Chubu University in 2014, 2016, and 2019, respectively. His research interests include computer vision, machine learning, and video analysis. He is member of IEEE and IPSJ.

Title: Toward Fair Face Analysis Systems: Metric Learning Loss with Adaptive Margins for Fair Multilabel Face Attribute Recognition
Speaker: Dr. Masashi Usami, NEC Corporation, Japan
Abstract: NEC researchers are studying and developing face recognition systems using AI models and expanding their potential for social applications, such as gateless access control systems using biometric recognition. We are also investigating various areas of face image analysis, for example, healthcare research. With the development of AI model research and the growing potential of face image analysis, the fairness of AI systems is becoming increasingly important. NEC declares its corporate purpose to create social values of fairness to promote a more suitable world. It is imperative for us researchers to think deeply about the fairness of AI systems. This session mainly presents one of our studies on fair face attribute recognition. Datasets with imbalanced sample distributions can affect the fair discrimination ability of AI models. This imbalance problem becomes more complex in multi-label classification tasks due to the variety of imbalance levels. We proposed a novel training method to improve the fair classification ability in the multi-label classification tasks such as face attribute recognition.

Biosketch: Masashi Usami is a researcher of Biometrics Research Laboratories, NEC Corporation, Japan. Previously he specialized in the experimental particle physics, and received Ph.D. degree from Department of Physics, Graduate School of Science, The University of Tokyo. Currently, he is interested in the techniques and social applications of biometrics authentications, focusing on the fairness of face recognition and face attribute recognition.

Title: Towards annotation-free image recognition AIs
Speaker: Dr. Tomokazu Kaneko, NEC Corporation, Japan
Abstract: The manual annotation process is a critical bottleneck in implementing image recognition AIs. Training an object classification model requires an image dataset of each recognition target, and the annotation cost increases as the number of recognition target categories increases. Especially in retail stores, where hundreds of new products are coming in daily, updating the dataset requires continuous annotation costs. We propose an efficient product registration system for image recognition AIs in retail stores. The user only needs to shoot a video of the product being held and rotated in hand for 10-20 seconds. The proposed system focuses on the moving areas of the captured video to localize the target object’s position and automatically crops the images to generate an image dataset. The proposed system approaches the problem of extending recognition targets not by methods such as few-shot learning but from the perspective of improving the efficiency of the registration process.

Biosketch: Tomokazu Kaneko, Ph.D. is an assistant manager (senior researcher) of Visual Intelligence Research Laboratories, NEC Corporation, Japan. His research covers object recognition, retail product detection, domain adaptation, and instant object registration system. He is also working on the research topic of object understanding based on the object-centric representation learning and world models.

Title: Harnessing Domain Knowledge of Intra-Class Variations to Mitigate Label Scarcity Bias (in Satellite Imagery)
Speaker: Ms. Tsenjung Tai, NEC Corporation, Japan
Abstract: Domain adaptation utilizes labeled data from one domain (the ‘source’) to enhance the performance in another domain (the ‘target’) that is either scarcely labeled or unlabeled. However, when intra-class variability overshadows the distinctions between classes, severe misclassifications can arise due to the bias from label scarcity in the target domain. We introduce a feature conversion module that generates synthetic features from the few labeled target domain data by adapting inter-class knowledge and intra-class variations from the source domain. The synthetic features thus approximate a broader spectrum of the target domain’s diversity. Our approach is assessed with satellite imagery classification tasks, where images from the same class can appear dramatically different depending on their capturing angles. The feature conversion module modifies labeled features as if they were extracted from images captured at various angles. Our classifier achieves enhanced accuracy training with only a few annotations in the target domain, particularly for images captured at angles that differ from the labeled training examples.

Biosketch: Tsenjung Tai earned her M.S. degree in Computer Science from the Hong Kong University of Science and Technology. At NEC’s Visual Intelligence Research Laboratories, she specializes in domain adaptation for satellite imagery recognition and change detection, addressing challenges in limited data learning. In 2022, she was honored with the “Innovator under 35, Japan” award in recognition of her team’s significant contribution to enabling rapid post-disaster responses.

Title: The Power of Retrieval for Video Analysis on Human Behavior Understanding
Speaker: Dr. Jianquan Liu, NEC Corporation, Japan
Abstract: In this talk, Dr. Liu will introduce an industrial level framework of utilizing the power of retrieval techniques for video analysis on human behavior sensing and understanding. This talk will mainly demonstrate a series of selected research achievements that contributed to both academia and industry in our framework. Our framework is composed of a series of cutting-edge technologies that sense the data generated in the real world, transform them into readable, visible, and modellable digital forms, and finally analyze these digital data to understand the human behavior. For example, such cutting-edge technologies include the human sensing by traditional cameras [MM’14, MM’16],  360 cameras [MM’19, WACV’20, ICIP’21], and microwave sensors [MM’20], the action recognition [MM’19, WACV’20, MM’20, ICIP’21], the object tracking [MIPR’19, BigMM’19], the human object interaction [CBMI’19], the scene recognition [MM’19], the behavioral pattern analysis [MM’16, ICMR’18 MIPR’19], the retrieval [MM’14, MM’16, MM’17, ICMR’18, CBMI’19, ICASSP’21] and the visualization [SIGGRAPH’16, ICMR’18], towards the fully understanding of human behavior. These works will be introduced in the way of a general overview with interactive technical demos and interesting insights, for human behavior sensing and understanding by adopting effective processing techniques and designing efficient algorithms. Finally, Dr. Liu will pick up and share some challenging issues and directions for the realization of digital society in the future.

Biosketch: Jianquan Liu is currently the Director/Head of Video Insights Discovery Research Group at the Visual Intelligence Research Laboratories of NEC Corporation, working on the topics of multimedia data processing. He is also an adjunct professor at Graduate School of Science and Engineering, Hosei University, Japan. Prior to NEC, he was a development engineer in Tencent Inc. from 2005 to 2006, and was a visiting researcher at the Chinese University of Hong Kong in 2010. His research interests include high-dimensional similarity search, multimedia databases, web data mining and information retrieval, cloud storage and computing, and social network analysis. He has published 70+ papers at major international/domestic conferences and journals, received 30+ international/domestic awards, and filed 70+ PCT patents. He also successfully transformed these technological contributions into commercial products in the industry. Currently, he is/was serving as the Industry Co-chair of IEEE ICIP 2023 and ACM MM 2023; the General Co-chair of IEEE MIPR 2021; the PC Co-chair of IEEE IRI 2022, ICME 2020, AIVR 2019, BigMM 2019, ISM 2018, ICSC 2018, ISM 2017, ICSC 2017, IRC 2017, and BigMM 2016; the Workshop Co-chair of IEEE AKIE 2018 and ICSC 2016; the Demo Co-chair of IEEE MIPR 2019 and MIPR 2018. He is a member of ACM, IEEE, IEICE, IPSJ, APSIPA and the Database Society of Japan (DBSJ), a member of expert committee for IEICE Mathematical Systems Science and its Applications (2017-), and IEICE Data Engineering (2015-2021), and an associate editor of IEEE TMM (2023-), ACM TOMM (2022-), EURASIP JIVP (2023-), IEEE MultiMedia Magazine (2019-2022), ITE Transaction on Media Technology and Applications (2021-), APSIPA Transactions on Signal and Information Processing (2022-), and the Journal of Information Processing (2017-2021). Dr. Liu received the M.E. and Ph.D. degrees from the University of Tsukuba, Japan.

NTT Seminar (organized by NTT Corporation, Japan)

Seminar Title: NTT’s Media Processing AI and Its Industrial Applications
Seminar Abstract: This industry seminar introduces some of NTT’s artificial intelligence technologies for image processing. Specifically, we will describe a multi-modal processing AI with all-in-one architecture that emulates human-like capabilities. We will also describe high-speed and efficient computing by the novel event-driven inference paradigm. Furthermore, we plan to explain other image media processing techniques in NTT R&D, like point cloud processing. In each presentation, we would like to introduce the motivation behind these technologies and their industrial applications. We are excited to share the details of the presentations soon. Stay tuned for more updates!

Title: MediaGnosis: the next-generation media processing artificial intelligence
Speaker: Dr. Ryo Masumura, NTT Corporation, Japan
Abstract: MediaGnosis provides the all-in-one cross-media processing module for visual, audio, and text media. One of the most notable distinctions of MediaGnosis has a human-like cross-media processing architecture. We will describe the motivation behind this architecture, some of the many novel algorithms for media processing in MediaGnosis, and the integrated method of each media processing. We will also describe our unique advertisement promotion efforts for making the research achievements and development widely known.

Biosketch: Ryo Masumura received B.E., M.E., and Ph.D. degrees in engineering from Tohoku University, Sendai, Japan, in 2009, 2011, 2016, respectively. Since joining Nippon Telegraph and Telephone Corporation (NTT) in 2011, he has been engaged in research on speech recognition, spoken language processing, and natural language processing. He received the Student Award and the Awaya Kiyoshi Science Promotion Award from the Acoustic Society of Japan (ASJ) in 2011 and 2013, respectively, the Sendai Section Student Awards The Best Paper Prize from the Institute of Electrical and Electronics Engineers (IEEE) in 2011, the Yamashita SIG Research Award and the SIG-NL Excellent paper award from the Information Processing Society of Japan (IPSJ) in 2014 and 2018, the Young Researcher Award and the Paper Award from the Association for Natural Language Processing (NLP) in 2015 and 2020, the ISS Young Researcher’s Award in Speech Field and the ISS Excellent Paper Award from the Institute of Electronic, Information and Communication Engineers (IEICE) in 2015 and 2018. He is a member of the ASJ, the IPSJ, the NLP, the IEEE, and the International Speech Communication Association (ISCA).

Title: geoNebula: Elemental Technologies for Supporting the Integration of Real Space and Cyberspace
Speaker: Dr. Satoshi Suzuki, NTT Corporation, Japan
Abstract: To create a human-centered society in which everyone can lead a comfortable, vibrant, and high-quality life, we study a system that integrates real space and cyberspace. In this talk, point cloud processing technology, called geoNebula, is introduced. geoNebula analyzes data measured in real space, recognizes space and objects, and compresses the data into a compact form suitable for constructing cyberspace. We will describe our novel algorithms for point cloud processing, and the practical applications for constructing cyberspace that can precisely reproduce real space using geoNebula.

Biosketch: Satoshi Suzuki received B.E., M.E., and Ph.D. degrees from the University of Electro-Communications in 2015, 2017, and 2022, respectively. He joined Nippon Telegraph and Telephone (NTT) in 2017. He is currently a researcher at NTT Computer and Data Science Laboratories. His current research interests include neural networks, computer vision, surveillance systems, and machine learning. He received the IEEE CIS Japan Chapter Young Researcher Award in 2015. He is a member of the Information Processing Society of Japan (IPSJ).

Title: Conversational system that talks about the scenery seen from vehicles
Speaker: Dr. Hiroaki Sugiyama, NTT Corporation, Japan
Abstract: We are working on developing a passenger agent that can talk with people about the scenery seen from a moving vehicle. We believe that such casual dialogue with passengers enriches the driving. Such passenger agents should continuously understand input scenery images and talk about them. Recent advances in chatting dialogue systems based on huge-scale transformers promise to realize natural dialogue; however, most focus on text dialogues. While vision-based dialogue systems that aim to answer questions about the content in the given images are proposed, few studies tackle realizing casual dialogue systems that talk about the scenery. In this talk, we introduce our dialogue system that uses the changing scenery seen from a vehicle as a topic of conversation.

Biosketch: Hiroaki Sugiyama is a Senior Researcher, Interaction Research Group, Innovative Communication Laboratory, NTT Communication Science Laboratories. He received a B.E. and M.E. in information science and technology from the University of Tokyo in 2007 and 2009 and Ph.D. in engineering from Nara Institute of Science and Technology in 2016. He joined Nippon Telegraph and Telephone Corporation (NTT) in 2009. He has been engaged in research on chatting dialogue system for natural human interaction. He is a member of the Institute of Electrical and Electronics Engineers (IEEE), Information Processing Society of Japan (IPSJ), Japanese Society for Artificial Intelligence (JSAI), and Association for Natural Language Processing.

Title: Distributed AI Video Analytics
Speaker: Ms. Monikka Roslianna Busto, NTT Corporation, Japan
Abstract: Deepack is a real-time video analytics framework developed for NTT groups’ surveillance applications which allows analysis on multiple cameras at a lower cost by sharing GPU resources amongst a network of cameras. In a use case like smart city surveillance, thousands of cameras request large amounts of workloads that need to be filtered to only meaningful events to optimize resource consumption. In addition, widespread surveillance in public areas is a major concern for security. In this session, we discuss how distributed computing between the edge and the cloud make real-time analytics more feasible in terms of cost, efficiency, and privacy in Deepack’s use cases. We also discuss how techniques called model cascade and model splitting offer dynamic decision making for distributing the workload to lower the computational cost while also preserving the privacy of the data being exchanged between the edge and cloud.

Biosketch: Monikka Roslianna Busto is a Researcher at NTT Software Innovation Center. She graduated from the Electrical and Electronics Engineering Institute, College of Engineering, the University of the Philippines in 2017, and received a master’s degree from the Department of Information and Communications Engineering, Tokyo Institute of Technology in 2021. She joined Nippon Telegraph and Telephone Corporation in the same year. Her research interests include computer vision, collaborative intelligence for edge computing, remote sensing image analysis and multi-modal AI.