ICIP 2023 Special Sessions

Deep Structural Analysis and Compositional Approaches for Image Processing

Recent years have witnessed the rapid progress of advanced image and video processing
techniques, which have led to widespread adoption in applications such as medical data analysis, augmented reality, and autonomous driving. The great success is largely due to the development of powerful deep learning algorithms that could efficiently process images and videos in a datadriven fashion. These algorithms can automatically identify visual cues, extract semantic features, and apply nonlinear operations. To further adopt these techniques to more complex scenarios, it is still important to exploit advanced mechanisms for effective image processing. Recently, there are increasing interest in modeling the intrinsic structures of visual data using deep neural networks, which would further uncover the underlying structural patterns and provide contextual relations in both seen and unseen scenes. Deep structural analysis in images involves region decomposition, interaction understanding, relation propagation, and neural compositional mining, aiming to bring rich structural information into the image processing procedure and capture visual structural layouts. As for video processing, it also investigates spatiotemporal dynamics using graph representations and perceives temporal variations in a hierarchical way. Another line of this research is to model concept compositionality and learn probabilistic models of the data distribution using generative approaches, which would produce novel samples to depict the visual layout structures. The development of deep structural analysis would introduce better interpretability for the existing image processing tools and advance the stateoftheart techniques towards model robustness. In this special session, we aim to bring together the latest advance in deep structural analysis and compositional approaches for computational image processing. This special session aims to cover topics including but not limited to:
Generative and compositional methods for controllable visual synthesis/editing
Hierarchical approaches for visual data processing
Structured object/action/scene processing
Probabilistic graphical models for relation understanding
Selfsupervised approaches for image and video processing
Unsupervised disentanglement learning
Combining handcrafted features and deep features for robust image processing
Robust structure mining for medical processing
Graph neural networkbased neural image processing techniques, e.g., graph convolutional networks and graph attention networks
Spatiotemporal prediction and timeseries analysis

Linchao Zhu, Zhejiang University, China
Yi Yang, Zhejiang University
Mike Z. Shou, NUS, Singapore
Chen Sun, Brown University, USA

FewShot Learning for Computer Vision Tasks (FSL4CVT)

Advances in deep learning have led to the development of multiple models for many specific computer vision tasks. However, most successful deep neural network architectures require training on massive datasets annotated by experts. However, data annotation is a precise and important operation in supervised learning. Therefore, annotating a large training dataset is a timeconsuming, laborintensive, and expensive process. Also, the availability of large training datasets depends on the type of application domain. In fact, some domains may suffer from a lack of data to annotate. To avoid relying on this annotation process and the scarcity of annotatable data, some studies have attempted to meet this challenge by training models using only a small amount of annotated data. This special session aims to present recent advances in machine learning in the particular context of limited annotated data for specific computer vision tasks. This session will focus on a very active area of interest to many academic and industrial researchers and provide an opportunity to discuss their latest and innovative contributions to this rapidly growing field. This session will address computer vision tasks in a fewshot regime.

Anissa Mokraoui, L2TI Laboratory, Université Sorbonne Paris Nord
Ismail BEN AYED, Department of System Engineering, ETS Montreal, University of Quebec, Canada

Graph signal processing and machine learning for interpretable and robust image processing

There exists a growing demand for analyzing irregularly structured data like signals obtained by sensor networks (IoT), multimodal data for capturing 3D space, biometric data acquired by vital sensors. Images and videos, which have been a main target of ICIP, can also be regarded as such data since they have irregular structures like edges, contours, and textures. Graph signal processing (GSP) and graph neural networks (GNNs) have been hot topics in signal processing and machine learning, and they have been unifiedly developed. GSP is closely related to mathematical modeling, and GNN is considered as a datadriven analysis. Their intersections, i.e., integration of mathematical modeling into deep learning methods and exploiting deep learning techniques as a part of mathematical modeling, have also been interesting and developing research topics.

This proposing special session aims to give the audience an opportunity to get a broad overview of the latest technologies related to GSP and GNN. We specifically focus on interpretable and robust image processing using graphbased techniques. It will be a useful and informative resource for participants in ICIP2023.

In this special session, we gather active researchers from a wide spectrum of research disciplinessignal processing, computational vision, point cloud processing, machine learning etcto study interpretable and robust image processing using graphbased techniques. While related special sessions have also been organized in ICIP 2016 (“Graphbased Multidimensional Signal Processing” and “Light Field Image Processing”), in ICIP 2017 (“Light Field Imaging and Display”), in ICIP 2019 (“Graph spectral processing of 3D point cloud data”), and in ICIP 2021 (“Explainable Deep Neural Networks for Image/Video Processing”), to the best of our knowledge there has not been a special session exclusively on interpretable (i.e., explainable) image processing with graph signal processing and machine learning, despite their growing importance. Thus, we believe that this special session is very timely and will attract a large ICIP audience both from academia and industry.

Yuichi Tanaka, Osaka University, Japan
Yukihiro Bandoh, NTT Corporation

Autonomous Vehicle Vision

Due to the recent boom in artificial intelligence technologies, there are growing expectations that fully autonomous driving may become a reality in the near future and it is expected to bring fundamental changes to our society. Fully autonomous vehicles offer great potential to improve efficiency on roads, reduce traffic accidents, increase productivity, and minimize our environmental impact in the process.

Ensuring the safety of fully autonomous vehicles requires a multidisciplinary approach across all the levels of functional hierarchy, from hardware fault tolerance to modern machine/deep learning, to cooperating with humans driving conventional vehicles, to validating systems for operation in highly unstructured environments, to appropriate regulatory approaches.

As a key component of autonomous driving, autonomous vehicle vision systems are typically developed based on cuttingedge computer vision, machine/deep learning, image/signal processing, and advanced sensing technologies. With recent advances in deep learning, autonomous vehicle vision systems have achieved very compelling results. However, there still exist many challenges. For instance, the perception modules cannot perform well in poor weather and/or illumination conditions or in complex urban environments. Developing robust and allweather visual environment perception algorithms is a popular research area that requires more attention. In addition, most perception methods are generally computationallyintensive and cannot run in realtime on embedded and resourcelimited hardware. Therefore, fully exploiting the parallelcomputing architecture, such as embedded GPUs, for realtime perception, prediction, and planning is also a hot subject that is researched in the autonomous driving field. Furthermore, existing supervised learning approaches have achieved compelling results, but their performance is fully dependent on the quality and amount of labeled training data. Labeling such data is a timeconsuming and laborintensive process. Un/selfsupervised learning approaches and domain adaptation techniques are, therefore, becoming increasingly crucial for realworld autonomous driving applications.

Autonomous Vehicle Vision (AVVision, webpage: avvision.xyz) serves as a premier platform and foundation for the technology of tomorrow. The AVVision community organized Special Sessions at ICAS 2021, ICIP 2021, and ICIP 2022, and Virtual Workshops at WACV 2021, ICCV 2021, and ECCV 2022. An AVVision special session can attract wide attention from the autonomous driving community.

Rui Fan, Tongji University

Wenshuo Wang, McGill University, Montreal, Canada

EnergyEfficient Image and Video Communications

Several recent studies revealed that image and video processing technology, in particular technology regarding video streaming and neuralnetwork based image and video processing, contribute substantially to global energy consumption. Consequently, the worldwide energy consumption related to such applications has reached a level, in which corresponding greenhouse gas emissions contribute substantially to global warming. As such, research targeting the sustainable use of image and video communications is of high importance for the future of our planet. This special session focuses on novel and effective methods to reduce the power and energy consumption of any algorithm in image and video communication pipelines. Applications are, for example, (1) neuralnetwork based compression, (2) the encoding and transcoding of videos using largescale server farms and memory devices, (3) the power and energy consumption of enduser devices such as TVs, cameras, or laptop PCs, (4) the endtoend evaluation of video communication networks.

Topics of interest include (but are not limited to):
Standardization for lowpower video streaming;
Metrics for quantification of energy efficiency;
Energy efficient encoding and decoding solutions;
Efficient learningbased image and video compression;
Pre and postprocessing for energy efficient video communications (e.g., film grain analysis, removal, and synthesis or displayconstrained contrast enhancement);
Poweroptimized hardware implementations.

Christian Herglotz, FriedrichAlexander University, ErlangenNürnberg, Germany
Olivier Le Meur, Interdigital, France
Daniel Palomino, Federal University of Pelotas (UFPel), Brazil
Alexandre Mercat, Tampere University (TAU), Tampere, Finland

Emerging trends in learningbased image and video compression

Conventional image and video coding schemes, such as JPEG, JPEG2K, HEVC and VVC, rely on handcrafted transform, prediction and entropy coding schemes to compress video efficiently. This paradigm has achieved impressive coding performance over the years, but is now reaching a plateau, where further coding gains are very hard to obtain and require significant complexity increases. In the past few years, learningbased coding has become a successful alternative to further improve compression performance by replacing (parts of) the video coding pipeline with deep neural networks. Many learningbased codecs employ variational autoencoders with an entropy bottleneck to learn compact representations of images, which are quantized and entropy coded as bitstream. This approach has been later extended by embedding hyperprior models for the latent space, autoregressive prediction in the pixel or latent space, different techniques for differentiable quantization, etc.
Despite the significant advances in learningbased image and video coding, there are still many challenges ahead, e.g., delivering highquality or nearlossless reconstructions at high bitrates, introducing accurate and flexible rate control schemes, measuring the perceptual quality, reducing the computational complexity, and adapting to new coding scenarios such as coding for machines and immersive video formats. This special session aims to bring together industry and academics from the signal processing and computer vision communities to propose and discuss emerging solutions for learningbased image and video compression. Topics of interest in this special session include (but
are not limited to) the following topics:

Novel neural architectures and models for learningbased image and video coding (e.g., stable diffusion, normalizing flows, transformers, etc.)
Generative compression
Rate control schemes for learningbased coding
Perceptual quality evaluation and loss functions
Lowcomplexity coding solutions
Multitask coding for humans and machines
Standardization of learningbased coding schemes

Giuseppe Valenzise, Université ParisSaclay, CNRS, France
Stéphane Lathuilière, Telecom Paris, France