Title: Near-eye light field AR/VR display
Speaker: Homer Chen, PetaRay Inc, Taiwan
Abstract: Human eyes have evolved over millions of years to have consistent vergence and accommodation. However, most AR/VR displays available today can easily cause vergence-accommodation conflict (VAC) that is the root cause of visual fatigue for viewers. At PetaRay, we aim to revolutionize the fundamentals of AR/VR glasses from “showing images to each eye” to “projecting light field to retina.” In this talk, I will show how a light field display with continuous focal plane can provide the most natural visual experiences for users.
Biosktech: Homer H. Chen received the Ph.D. degree in Electrical and Computer Engineering from University of Illinois at Urbana-Champaign. Dr. Chen’s professional career has spanned industry and academia. Since August 2003, he has been with the College of Electrical Engineering and Computer Science, National Taiwan University, where he is Distinguished Professor. Prior to that, he held various R&D management and engineering positions with U.S. companies over a period of 17 years, including AT&T Bell Labs, Rockwell Science Center, iVast, and Digital Island (acquired by Cable & Wireless). He was a U.S. delegate for ISO and ITU standards committees and contributed to the development of many new interactive multimedia technologies that are now part of the MPEG-4 and JPEG-2000 standards. His recent research is related to AR/VR, multimedia signal processing, computational photography and display, and music data mining. Dr. Chen is an IEEE Life Fellow. Currently, he serves on the IEEE James H. Mulligan, Jr. Education Medal Committee. Previously, he served on the Senior Editorial Board of the IEEE Journal on Selected Topics in Signal Processing from 2020 to 2022, the Awards Committee from 2019 to 2021, the Conferences Board from 2020 to 2021, the Fourier Award Committee from 2015 to 2017, and the Fellow Reference Committee from 2015 to 2017, all of the IEEE Signal Processing Society. He was a General Co-chair of the 2019 IEEE International Conference on Image Processing and a Distinguished Lecturer of the IEEE Circuits and Systems Society from 2012 to 2013. He was an Associate Editor of the IEEE Transactions on Circuits and Systems for Video Technology from 2004 to 2010, the IEEE Transactions on Image Processing from 1992 to 1994, and Pattern Recognition from 1989 to 1999. He served as a Guest Editor for the IEEE Transactions on Circuits and Systems for Video Technology in 1999, the IEEE Transactions on Multimedia in 2011, the IEEE Journal of Selected Topics in Signal Processing in 2014, and Springer Multimedia Tools and Applications in 2015.
Title: Need for Low Latency: Media over QUIC
Speaker: Ali C. Begen, Comcast NBCUniversal, USA
Abstract: This session overviews developing a low-latency solution for media ingest and distribution, the work undertaken by the IETF’s new Media over QUIC (moq) working group and many industry-leading companies. It summarizes the motivation, goals, current work and potential improvements.
With the advent of QUIC, initially developed by Google and then standardized by the IETF (RFC 9000), the latest version of HTTP, H3 (RFC 9114), was built upon this new low-latency transport protocol to eliminate the known issues in the previous versions of HTTP (e.g., head-of-line blocking, HoL, due to underlying TCP protocol) and to benefit from other features such as improved congestion control, prioritized delivery and multiplexing. Although H3 outperforms its predecessors in most cases, existing adaptive streaming methods that have been highly tuned for HTTP/1.1 and 2 running on top of TCP do not give remarkably better results, with H3 running over QUIC. If timely delivery is critical, QUIC may perform better than TCP in congested environments. However, we still need a custom application-layer protocol to reap all the benefits QUIC provides at the transport layer. The new IETF working is chartered to study how to use QUIC for large-scale media transmission in one-to-one, one-to-many and many-to-one applications that might require interactivity (hence, low latency). The working group is still exploring the problem space and potential solutions, and a few different proposals are already being considered. MOQ is envisioned to be a common media protocol stack that will support (i) live streaming of events, news and sports with interactivity features, and (ii) scaling real-time collaboration applications to large audiences.
Rationale: Streaming has become increasingly important, effectively replacing most older media delivery models. Given that this is a popular research and industry topic, the audience would benefit from understanding the industry needs, constraints, pain points, and unsolved problems.
Biosktech: Ali C. Begen is currently a computer science professor at Ozyegin University and a technical consultant in Comcast’s Advanced Technology and Standards Group. Previously, he was a research and development engineer at Cisco. Begen received his PhD in electrical and computer engineering from Georgia Tech in 2006. To date, he received several academic and industry awards (including an Emmy® Award for Technology and Engineering), and was granted 30+ US patents. In 2020 and 2021, he was listed among the world’s most influential scientists in the subfield of networking and telecommunications. More details are at https://ali.begen.net.
Title: JPEG AI the new image compression standard entirely based on neural networks
Speaker: Elena Alshina, Huawei Technologies Dusseldorf GmbH, Germany
Abstract: Reflecting significant progress of AI-based algorithms for image compression JPEG launched the JPEG AI standardization project. This is the first-ever international standard entirely based on AI technologies. JPEG AI is a multi-task codec targeting not only superior image reconstruction for humans but is also capable to solve computer vision and image enhancement tasks from the same latent representation. The first version of JPEG AI will be finalized at the beginning of 2024. It is expected that JPEG AI will have two profiles: base profile with a target complexity 20 kMAC/pxl (which is acceptable for mobile devices) providing 10-15% compression gain over VVC Intra coding, and high profile which is ~10 times more complex but provides 30% compression gain over VVC anchor. The talk to be presented in Industry Expert Session will focus on an overview of JPEG AI standard key design elements, major challenges of AI-based codec standardization *such as device interoperability), and deployment perspective. Also, a JPEG AI demo on mobile devices is planned.
Biosktech: Dr. Elena Alshina graduated from Moscow State University (majored in Physics) and received PhD in mathematical modeling from the Russian Academy of Science in 1998. For a series of publications on computational math, together with Alexander Alshin she was awarded the Gold Medal of the Russian Academy of Science. In 2006 she joined Samsung and start working on video codec standard development, actively participating in HEVC and VVC standard development, authoring 1000+ proposals. In 2018 she joined Huawei Technologies as Chief Video Scientist, alter became also Lab Director for Audiovisual Technology Lab and Media Codec Lab. Since 2020 she is co-chair and editor of the JPEG AI standard. JPEG AI CfP response submitted by a team led by her demonstrated thy highest objective performance, outperforming VVC anchor by 32%.
Title: Adaptive Camera Adjustment with AI
Speaker: Hui Lam Ong, NEC Laboratories Singapore, Singapore
Abstract: In recent years, city governments across the globe have recognized the benefits of implementing video analytics to enhance and maintain safety and security. As a result, an increasing number of public areas, including housing estates, multi-storey car parks, and well-trafficked transit areas like train stations and bus interchanges, are being safeguarded with surveillance cameras. The initial deployment process involves a considerable financial investment to upgrade the necessary infrastructure, procure high-resolution surveillance cameras, and modern hardware and software for data storage and analysis. Furthermore, the employment of highly skilled IT professionals is mandatory to install, manage, troubleshoot, and maintain these systems, presenting another resource challenge for city governments. The speaker for this session is equipped with extensive experience dealing with these challenges. He, along with his team, will provide a comprehensive exploration of these industry-wide issues. Their mission to streamline the process of video analytics solutions deployment has led them to develop a semi-automated, AI-aided system. This advanced system aims to alleviate these challenges and make the implementation of these solutions more accessible and manageable.
Biosktech: Hui Lam Ong specializes in high-performance video analytic solutions. His decades of professional experience in cyber security and software architecture design enables him to understand the complexity of video analytics deployment challenges. His current focus aims to reduce manpower and maintenance cost of large-scale video analytics solutions.
Title: Low Power Image Processing on Constrained Devices Using Tiny ML and Neuromorphic Approach
Speaker: Arpan Pal, TCS Research, Tata Consultancy Services, India
Abstract: More and more of IoT based intelligent systems demand embedding the analytics and AI on the edge device. This is driven by the needs of low latency, network unavailability/unreliability and a need for inherent privacy/security. However, edge devices are usually constrained in terms of compute, memory and power which poses challenges for such deployments.
In this presentation we will first introduce a Tiny Edge Wizard that can take large Deep Neural Network models and try to reduce their size / improve their latency automatically using an innovative integrated approach that uses both reduction using pruning using Lottery Ticket Hypothesis (LTH)  and synthesis using Neural Architecture Search (NAS) . The automation helps in reducing development time, helps reducing over-parameterised models and minimises the human-expert time. We will take example use cases of medical image processing and manufacturing shop-floor inspection to demonstrate the efficacy of our proposed system.
Next we will introduce Neuromorphic Computing and Spiking Neural Networks (SNN) as a means towards ultra-low power intelligent processing at edge. We cover design and implementation of both efficient spike encoders and spiking neural models on neuromorphic chipsets. We will present application use cases around gesture recognition for human-robot interaction  and lossless image compression onboard nano satellites  and present some interesting results.
Finally we will conclude with technology trends we see in this area that also relates to sustainable analytics to negate the ever-increasing energy consumption trends in computation.
 Ishan Sahu, Arijit Ukil, Sundeep Khandelwal, and Arpan Pal, “LTH-ECG: Lottery Ticket Hypothesis-based Deep Learning Model Compression for Atrial Fibrillation Detection from Single Lead ECG On Wearable and Implantable Devices,” 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 2022
 Shalini Mukhopadhyay, Swarnava Dey, Avik Ghose, Pragya Singh, and Pallab Dasgupta. 2023. Generating Tiny Deep Neural Networks for ECG Classification on Micro-controllers, IEEE International Conference on Pervasive Computing and Communications (PerCom), 2023
 Arun M. George, Dighanchal Banerjee, Sounak Dey, Arijit Mukherjee and P. Balamurali, “A Reservoir-based Convolutional Spiking Neural Network for Gesture Recognition from DVS Input,” International Joint Conference on Neural Networks (IJCNN), 2020
 Sayan Kahali, Sounak Dey, Chetan S. Kadway, Arijit Mukherjee, Arpan Pal and Manan Suri, “Low-Power Lossless Image Compression on Small Satellite Edge Using Spiking Neural Network,” International Joint Conference on Neural Networks (IJCNN), 2023
Biosktech: Arpan Pal has more than 30 years of experience in the area of Intelligent Sensing, Signal Processing &AI, Edge Computing and Affective Computing. Currently, as Distinguished Chief Scientist and Research Area Head, Embedded Devices and Intelligent Systems, TCS Research, he is working in the areas of Connected Health, Smart Manufacturing, Smart Retail and Remote Sensing. He is on the editorial board of notable journals like ACM Transactions on Embedded Systems, Springer Nature Journal on Computer Science and is on the TPC of notable conferences like IEEE Sensors, ICASSP and EUSIPCO. He has filed 180+ patents (out of which 95+ granted in different geographies) and has published 160+ papers and book chapters in reputed conferences and journals. He has also written three complete books on IoT, Digital twins in Manufacturing and Application AI in Cardiac screening.
He is on the governing/review/advisory board of some of the Indian Government organizations like CSIR, MeitY, Educational Institutions like IIT, IIIT and Technology Innovation Hub. He is two times winner of Tata Group top Innovation award in Tata Innovista under Piloted technology category. Prior to joining Tata Consultancy Services (TCS), Arpan had worked for DRDO, India as Scientist for Missile Seeker Systems and in Rebeca Technologies as their Head of Real-time Systems. He is a B.Tech and M. Tech from IIT, Kharagpur, India and PhD. from Aalborg University, Denmark.
Title: arTXTract – Extracting Text from Challenging Paper Documents in a FinST
Speaker: Oliver Giudice, Banca D’Italia, Italy
Abstract: Today we talk about Natural Language Processing (NLP) and Large Language Models: all useful tools that allow the extraction of value from large amounts of data with fascinating results. To date, however, before being able to use advanced NLP systems, it is necessary to have the data in a digitized form and without imperfections. This hypothesis is not often respected: in many institutions like mine, most of the information assets of the past are paper-based and therefore it is necessary to create increasingly advanced OCR tools in order to be able to extract the textual information components from documents.
The extraction is not simple: the documents have different resolutions, scanning methods, often have imperfections, images, tables, signatures, stamps. All of these problems makes it difficult if not impossible to extract information with OCR tools, whether they are free, open or commercial. In fact, not even the best commercial tools are able to solve all digitization problems and accurately extract all the desired information (i.e. tables should be treated in a specific way). Last but not least, even if the document is already digital it is not often easy to extract information due to the diversity of formats, encodings, etc.
In this talk, arTXTract will be presented: a prototype solution created in-house for handling documents from financial institutions. The pipeline, designed to successfully process a large amount of documents, will be presented, being able to process different kind of data and surpassing the results obtainable with commercial software.
A question therefore arises: is it possible to generalize the process? Or does each institution/company have to equip itself with its own text extraction systems to digitize its archives? At the end of the speech, we will try to give an answer to this problem which, unfortunately, is yet to be totally solved.
Biosktech: Oliver Giudice received his degree in Computer Engineering (summa cum laude) in 2011 at University of Catania and his Ph.D. in Maths and Computer Science in 2017 defending a thesis entitled “Digital Forensics Ballistics: Reconstructing the source of an evidence exploiting multimedia data”. From 2011 to 2014 he was involved in various research projects at University of Catania in collaboration with the Civil and Environmental Engineering department and the National Sanitary System. He was leader of the R&D team of University of Catania for project Farm.PRO (PO/FESR Misura 220.127.116.11) from 2012 to 2014. In 2014 he started his job as a researcher at the IT Department of Banca D’Italia dealing with text classification and crypto-currencies analysis. For various years since 2011 he collaborated with the IPLab (http://iplab.dmi.unict.it) working on Multimedia Forensics topics and being involved in various forensics cases as Digital Forensics Expert. Since 2016 he has been co-founder of “iCTLab s.r.l.”, spin-off of University of Catania, company that works in the field of Digital Forensics, Privacy and Security consulting and software development. His research interests include machine learning, computer vision, image coding, urban security, crypto-currencies and multimedia forensics.
Title: Enhancing Content Experiences with Contextual Data
Speaker: Viswanathan (Vishy) Swaminathan, Adobe Research, Adobe, USA
Abstract: It took about 20 years for video over the Internet to be delightful for end users. This was built on a large body of research from both academia and the industry. Over the last few years, AI has provided generational transformations both in content understanding and in machine learning algorithms that learn and adapt using large amounts of contextual data. How do we stand on the shoulders of these giants (transformations) to make next-gen content experiences compelling? I will start with a glimpse of available data from user sessions for on-demand and live videos. I will elaborate on how this fine-grained contextual data can be combined with deep learning powered multimedia understanding to enhance content experiences. The first generation of our video research at Adobe focused on simple insights from content while the second generation focused on insights from video consumption data. Now, the explosion in compute and data enables the third generation of research that derives holistic insights simultaneously from both content and the behavioral data to close the feedback loop to improve content consumption experiences. With a few examples, I will show how to leverage these technological transformations to enhance end-user content experiences ranging from traditional video to immersive mixed-reality experiences. Some demos of past and current projects will be shown with a call to leverage contextual data to improve and personalize end-user media experiences.
Biosktech: Vishy (Viswanathan) Swaminathan is a Sr. Principal Scientist in Adobe Research leading the Enterprises, Platforms, Insights, and Content (EPIC) Research org, working at the intersection of insights from behavioral data and multimedia content. His areas of research include next generation video and immersive experiences, video streaming, and in general data-driven content and marketing technologies. His research work has substantially influenced various technologies in Adobe’s video delivery, advertisement, and recommendations products including the guts of HTTP Dynamic Streaming which won the ‘Best Streaming Innovation of 2011′ Streaming Media Readers’ Choice Award. He has received several awards including for best papers, 2017 Distinguished alumnus from Utah State University ECE Department, and 3 ISO certificates of appreciation for contributions to MPEG Standards including editing the EMMY winning MPEG DASH Standard. Prior to joining Adobe, Vishy was a Senior Researcher at Sun Labs. He received his MS and Ph.D. in Electrical Engineering from Utah State University. He received his B.E degree in ECE from the College of Engineering, Guindy, Anna University, Chennai, India. Vishy has authored several papers, articles, RFCs, and book chapters, has about 100+ issued patents, and volunteers in organizing IEEE and ACM conferences.