ICIP 2023 Special Sessions
Deep Structural Analysis and Compositional Approaches for Image Processing
Recent years have witnessed the rapid progress of advanced image and video processing techniques, which have led to widespread adoption in applications such as medical data analysis, augmented reality, and autonomous driving. The great success is largely due to the development of powerful deep learning algorithms that could efficiently process images and videos in a data–driven fashion. These algorithms can automatically identify visual cues, extract semantic features, and apply nonlinear operations. To further adopt these techniques to more complex scenarios, it is still important to exploit advanced mechanisms for effective image processing. Recently, there are increasing interest in modeling the intrinsic structures of visual data using deep neural networks, which would further uncover the underlying structural patterns and provide contextual relations in both seen and unseen scenes. Deep structural analysis in images involves region decomposition, interaction understanding, relation propagation, and neural compositional mining, aiming to bring rich structural information into the image processing procedure and capture visual structural layouts. As for video processing, it also investigates spatio–temporal dynamics using graph representations and perceives temporal variations in a hierarchical way. Another line of this research is to model concept compositionality and learn probabilistic models of the data distribution using generative approaches, which would produce novel samples to depict the visual layout structures. The development of deep structural analysis would introduce better interpretability for the existing image processing tools and advance the state–of–the–art techniques towards model robustness. In this special session, we aim to bring together the latest advance in deep structural analysis and compositional approaches for computational image processing. This special session aims to cover topics including but not limited to:
– Generative and compositional methods for controllable visual synthesis/editing
– Hierarchical approaches for visual data processing
– Structured object/action/scene processing
– Probabilistic graphical models for relation understanding
– Self–supervised approaches for image and video processing
– Unsupervised disentanglement learning
– Combining handcrafted features and deep features for robust image processing
– Robust structure mining for medical processing
– Graph neural network–based neural image processing techniques, e.g., graph convolutional networks and graph attention networks
– Spatio–temporal prediction and time–series analysis
Organizers:
Linchao Zhu, Zhejiang University, China
Yi Yang, Zhejiang University
Mike Z. Shou, NUS, Singapore
Chen Sun, Brown University, USA
Few–Shot Learning for Computer Vision Tasks (FSL4CVT)
Advances in deep learning have led to the development of multiple models for many specific computer vision tasks. However, most successful deep neural network architectures require training on massive datasets annotated by experts. However, data annotation is a precise and important operation in supervised learning. Therefore, annotating a large training dataset is a time–consuming, labor–intensive, and expensive process. Also, the availability of large training datasets depends on the type of application domain. In fact, some domains may suffer from a lack of data to annotate. To avoid relying on this annotation process and the scarcity of annotatable data, some studies have attempted to meet this challenge by training models using only a small amount of annotated data. This special session aims to present recent advances in machine learning in the particular context of limited annotated data for specific computer vision tasks. This session will focus on a very active area of interest to many academic and industrial researchers and provide an opportunity to discuss their latest and innovative contributions to this rapidly growing field. This session will address computer vision tasks in a few–shot regime.
Organizers:
Anissa Mokraoui, L2TI Laboratory, Université Sorbonne Paris Nord
Ismail BEN AYED, Department of System Engineering, ETS Montreal, University of Quebec, Canada
Graph signal processing and machine learning for interpretable and robust image processing
There exists a growing demand for analyzing irregularly structured data like signals obtained by sensor networks (IoT), multi–modal data for capturing 3D space, biometric data acquired by vital sensors. Images and videos, which have been a main target of ICIP, can also be regarded as such data since they have irregular structures like edges, contours, and textures. Graph signal processing (GSP) and graph neural networks (GNNs) have been hot topics in signal processing and machine learning, and they have been unifiedly developed. GSP is closely related to mathematical modeling, and GNN is considered as a data–driven analysis. Their intersections, i.e., integration of mathematical modeling into deep learning methods and exploiting deep learning techniques as a part of mathematical modeling, have also been interesting and developing research topics.
This proposing special session aims to give the audience an opportunity to get a broad overview of the latest technologies related to GSP and GNN. We specifically focus on interpretable and robust image processing using graph–based techniques. It will be a useful and informative resource for participants in ICIP2023.
In this special session, we gather active researchers from a wide spectrum of research disciplines—signal processing, computational vision, point cloud processing, machine learning etc—to study interpretable and robust image processing using graph–based techniques. While related special sessions have also been organized in ICIP 2016 (“Graph–based Multidimensional Signal Processing” and “Light Field Image Processing”), in ICIP 2017 (“Light Field Imaging and Display”), in ICIP 2019 (“Graph spectral processing of 3D point cloud data”), and in ICIP 2021 (“Explainable Deep Neural Networks for Image/Video Processing”), to the best of our knowledge there has not been a special session exclusively on interpretable (i.e., explainable) image processing with graph signal processing and machine learning, despite their growing importance. Thus, we believe that this special session is very timely and will attract a large ICIP audience both from academia and industry.
Organizers:
Yuichi Tanaka, Osaka University, Japan
Yukihiro Bandoh, NTT Corporation
Prof. Wei Hu, Peking University, China
Autonomous Vehicle Vision
Due to the recent boom in artificial intelligence technologies, there are growing expectations that fully autonomous driving may become a reality in the near future and it is expected to bring fundamental changes to our society. Fully autonomous vehicles offer great potential to improve efficiency on roads, reduce traffic accidents, increase productivity, and minimize our environmental impact in the process.
Ensuring the safety of fully autonomous vehicles requires a multi–disciplinary approach across all the levels of functional hierarchy, from hardware fault tolerance to modern machine/deep learning, to cooperating with humans driving conventional vehicles, to validating systems for operation in highly unstructured environments, to appropriate regulatory approaches.
As a key component of autonomous driving, autonomous vehicle vision systems are typically developed based on cutting–edge computer vision, machine/deep learning, image/signal processing, and advanced sensing technologies. With recent advances in deep learning, autonomous vehicle vision systems have achieved very compelling results. However, there still exist many challenges. For instance, the perception modules cannot perform well in poor weather and/or illumination conditions or in complex urban environments. Developing robust and all–weather visual environment perception algorithms is a popular research area that requires more attention. In addition, most perception methods are generally computationally–intensive and cannot run in real–time on embedded and resource–limited hardware. Therefore, fully exploiting the parallel–computing architecture, such as embedded GPUs, for real–time perception, prediction, and planning is also a hot subject that is researched in the autonomous driving field. Furthermore, existing supervised learning approaches have achieved compelling results, but their performance is fully dependent on the quality and amount of labeled training data. Labeling such data is a time–consuming and labor–intensive process. Un/self–supervised learning approaches and domain adaptation techniques are, therefore, becoming increasingly crucial for real–world autonomous driving applications.
Autonomous Vehicle Vision (AVVision, webpage: avvision.xyz) serves as a premier platform and foundation for the technology of tomorrow. The AVVision community organized Special Sessions at ICAS 2021, ICIP 2021, and ICIP 2022, and Virtual Workshops at WACV 2021, ICCV 2021, and ECCV 2022. An AVVision special session can attract wide attention from the autonomous driving community.
Organizers:
Rui Fan, Tongji University
Wenshuo Wang, McGill University, Montreal, Canada
Energy–Efficient Image and Video Communications
Several recent studies revealed that image and video processing technology, in particular technology regarding video streaming and neural–network based image and video processing, contribute substantially to global energy consumption. Consequently, the worldwide energy consumption related to such applications has reached a level, in which corresponding greenhouse gas emissions contribute substantially to global warming. As such, research targeting the sustainable use of image and video communications is of high importance for the future of our planet. This special session focuses on novel and effective methods to reduce the power and energy consumption of any algorithm in image and video communication pipelines. Applications are, for example, (1) neural– network based compression, (2) the encoding and transcoding of videos using large–scale server farms and memory devices, (3) the power and energy consumption of end–user devices such as TVs, cameras, or laptop PCs, (4) the end–to–end evaluation of video communication networks.
Topics of interest include (but are not limited to):
– Standardization for low–power video streaming;
– Metrics for quantification of energy efficiency;
– Energy efficient encoding and decoding solutions;
– Efficient learning–based image and video compression;
– Pre– and post–processing for energy efficient video communications (e.g., film grain analysis, removal, and synthesis or display–constrained contrast enhancement);
– Power–optimized hardware implementations.
Organizers:
Christian Herglotz, Friedrich–Alexander University, Erlangen–Nürnberg, Germany
Olivier Le Meur, Interdigital, France
Daniel Palomino, Federal University of Pelotas (UFPel), Brazil
Alexandre Mercat, Tampere University (TAU), Tampere, Finland
Emerging trends in learning–based image and video compression
Conventional image and video coding schemes, such as JPEG, JPEG2K, HEVC and VVC, rely on hand–crafted transform, prediction and entropy coding schemes to compress video efficiently. This paradigm has achieved impressive coding performance over the years, but is now reaching a plateau, where further coding gains are very hard to obtain and require significant complexity increases. In the past few years, learning–based coding has become a successful alternative to further improve compression performance by replacing (parts of) the video coding pipeline with deep neural networks. Many learning–based codecs employ variational auto–encoders with an entropy bottleneck to learn compact representations of images, which are quantized and entropy coded as bitstream. This approach has been later extended by embedding hyper–prior models for the latent space, auto– regressive prediction in the pixel or latent space, different techniques for differentiable quantization, etc.
Despite the significant advances in learning–based image and video coding, there are still many challenges ahead, e.g., delivering high–quality or near–lossless reconstructions at high bitrates, introducing accurate and flexible rate control schemes, measuring the perceptual quality, reducing the computational complexity, and adapting to new coding scenarios such as coding for machines and immersive video formats. This special session aims to bring together industry and academics from the signal processing and computer vision communities to propose and discuss emerging solutions for learning–based image and video compression. Topics of interest in this special session include (but
are not limited to) the following topics:
– Novel neural architectures and models for learning–based image and video coding (e.g., stable diffusion, normalizing flows, transformers, etc.)
– Generative compression
– Rate control schemes for learning–based coding
– Perceptual quality evaluation and loss functions
– Low–complexity coding solutions
– Multi–task coding for humans and machines
– Standardization of learning–based coding schemes
Organizers:
Giuseppe Valenzise, Université Paris–Saclay, CNRS, France
Stéphane Lathuilière, Telecom Paris, France