Naoto YOKOYA

Journal publications

H. Wang, W. He, Z. Li, and N. Yokoya, “Cross-scenario damaged building extraction network: Methodology, application, and efficiency using single-temporal HRRS imagery,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 228, pp. 228–248, 2025.
PDF Quick Abstract
Abstract: The extraction of damaged buildings is of significant importance in various fields, such as disaster assessment and resource allocation. Although multi-temporal-based methods exhibit remarkable advantages in detecting damaged buildings, single-temporal extraction remains crucial in real-world emergency responses due to its immediate usability. However, single-temporal cross-scenario extraction at high-resolution remote sensing (HRRS) encounters the following challenges: (i) morphological heterogeneity of building damage which causes by the interplay of unknown disaster types with unpredictable geographic contexts, and (ii) scarcity of fine-grained annotated datasets for unseen disaster scenarios which limits the accuracy of rapid damage mapping. Confronted with these challenges, our main idea is to decompose complex features of damaged building into five attribute-features, which can be trained using historical disaster data to enable the independent learning of both building styles and damage features. Consequently, we propose a novel Correlation Feature Decomposition Network (CFDNet) along with a coarse-to-fine training strategy for the cross-scenario damaged building extraction. In detail, at the coarse training stage, the CFDNet is trained to decompose the damaged building segmentation task into the extraction of multiple attribute-features. At the fine training stage, specific attribute-features, such as building feature and damage feature, are trained using auxiliary datasets. We have evaluated CFDNet on several datasets that cover different types of disasters and have demonstrated its superiority and robustness compared with state-of-the-art methods. Finally, we also apply the proposed model for the damaged building extraction in areas historically affected by major disasters, namely, the Turkey–Syria earthquakes on 6 February 2023, Cyclone Mocha in the Bay of Bengal on 23 May 2023, and Hurricane Ian in Florida, USA in September 2022. Results from practical applications also emphasize the significant advantages of our proposed CFDNet.

A. Xiao, W. Xuan, J. Wang, J. Huang, D. Tao, S. Lu, and N. Yokoya, “Foundation Models for Remote Sensing and Earth Observation: A survey,” IEEE Geoscience and Remote Sensing Magazine, pp. 2–29, 2025.
PDF Quick Abstract
Abstract: Remote sensing (RS) is a crucial technology for observing, monitoring, and interpreting our planet, with broad applications across geoscience, economics, humanitarian fields, etc. While artificial intelligence (AI), particularly deep learning, has achieved significant advances in RS, unique challenges persist in developing more intelligent RS systems, including the complexity of Earth’s environments, diverse sensor modalities, distinctive feature patterns, varying spatial and spectral resolutions, and temporal dynamics. Meanwhile, recent breakthroughs in large foundation models (FMs) have expanded AI’s potential across many domains due to their exceptional generalizability and zero-shot transfer capabilities. However, their success has largely been confined to natural data like images and video, with degraded performance and even failures for RS data of various nonoptical modalities. This has inspired growing interest in developing RSFMs to address the complex demands of Earth observation (EO) tasks, spanning the surface, atmosphere, and oceans. This survey systematically reviews the emerging field of RSFMs. It begins with an outline of their motivation and background, followed by an introduction of their foundational concepts. It then categorizes and reviews existing RSFM studies including their datasets and technical contributions across visual FMs (VFMs), visual–language models (VLMs), large language models (LLMs), and beyond. In addition, we benchmark these models against publicly available datasets, discuss existing challenges, and propose future research directions in this rapidly evolving field.

J. Li, H. Huang, Y. Xia, X. Xu, Y. Wu, Q. Guo, Y. Liu, Y. Zhong, J. Min, S. Son, H. Kim, J. Yoo, G. Vivone, C. Li, W. He, R. Dian, H. Liu, H. Wang, K. Wei, A. K. Qin, N. Yokoya, S. Li, J. Chanussot, and D. Hong, “Multimodal Semantic Segmentation in Yangtze River Economic Belt: Outcome of the 2024 IEEE WHISPERS MMSeg-YREB Challenge,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 16534–16548, 2025.
PDF Quick Abstract
Abstract: With the growing availability of remote sensing (RS) data from diverse platforms, multimodal RS techniques have emerged as a transformative solution for large-scale semantic segmentation. In response, we developed MMSeg-YREB, a specialized framework that integrates complementary RS modalities, such as multispectral and synthetic aperture radar data from Sentinel-1/2 sources, to enhance the accuracy and robustness of land use and land cover mapping across urban and regional landscapes within the Yangtze River Economic Belt (YREB). By leveraging extensive geographic coverage and heterogeneous data sources, MMSeg-YREB supports a wide range of applications, from precise urban planning to comprehensive environmental monitoring. Utilizing state-of-the-art artificial intelligence methodologies, this framework aims to develop highly generalizable and scalable semantic segmentation models, driving methodological advancements and accelerating the adoption of Earth observation technologies across diverse regions. As part of this initiative, the multimodal semantic segmentation challenge, i.e., MMSeg-YREB, is organized in conjunction with the 14th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing 2024. To foster further research and innovation, all datasets and code will be publicly released online for the sake of reproducibility, contributing to the broader Earth observation and RS communities.

H. Huang, Y. Yu, T. Chen, N. Yokoya, J. Li, and A. Plaza, ”SAR and Social-Media-Based Change Detection With Dual-Threshold Fusion for Flood Inundation Mapping,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 15278–15290, 2025.
PDF Quick Abstract
Abstract: As one of the most destructive natural disasters, floods are increasingly frequent and severe due to urban development and population growth. The threshold-based method is widely acknowledged as an effective approach for detecting flood extent in synthetic aperture radar (SAR) imagery. However, determining the accurate threshold value poses a significant challenge. During periods of flooding, social media (SM) data posted by users provide a wealth of real-time information for flood inundation mapping (FIM) purposes. This study presents a new semiautomatic threshold determination method called SAR and social-media-based dual threshold (SSM-DT) for FIM. SSM-DT aims to improve accuracy by avoiding traditional method inaccuracies through a semiautomatic approach. The integration of SM data enhances real-time flood situation monitoring, enriching FIM comprehensiveness. First, the SAR images are processed and analyzed using change detection techniques to identify potential flood inundation areas. Simultaneously, a deep learning model is utilized to classify and filter SM data, enabling the retrieval of real-time flood-related information. Finally, the flood information obtained from both SAR images and SM data is fused together to generate a more accurate and comprehensive flood inundation map, leveraging the complementary nature of these two data sources. A case study focused on the extensive flooding caused by Hurricane Harvey in Houston in 2017 is discussed. The results demonstrate that the proposed method can provide near real-time depiction of flood extent, which is crucial for mitigating economic losses and minimizing casualties.

D. Ibañez, J. Xia, N. Yokoya, F. Pla, and R. Fernandez-Beltran, ”Inter-Sensor High-Resolution and Multi-Temporal Image Fusion for Unsupervised Domain Adaptation in Remote Sensing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–23, 2025.
PDF Quick Abstract
Abstract: Motivated by the increasing demand for robust segmentation in unlabeled remote sensing data, we propose domain adaptation multimodal and multi-temporal transformer (DAM-Former), a novel unsupervised domain adaptation (UDA) model that fuses high-resolution (HR) multimodal imagery with multi-temporal multispectral data. Current UDA approaches in remote sensing rarely exploit the complementary strengths of spatial and temporal features. To address this gap, our framework integrates two interconnected branches: a transformer-based network for HR multimodal data and a lightweight convolutional network with temporal attention for multi-temporal imagery. To improve segmentation accuracy and lower noise, the extracted features are robustly combined through a deep temporal fusion (DTF) module and a new mixed loss (ML) with an ensemble pseudo-label (EP) strategy. Extensive experiments and an ablation study on the FLAIR-2 dataset demonstrate that DAM-Former outperforms state-of-the-art methods, marking the first in-depth study of temporal information fusion in UDA segmentation for remote sensing data. Code available at https://github.com/ibanezfd/DAM_Former.

J. Xia, C. Broni-Bediako, and N. Yokoya, ”Generating national very high-resolution land cover product of France without any labels: A comparative study,” Remote Sensing Applications: Society and Environment, vol. 38, 101542, 2025.
PDF Quick Abstract
Abstract: Generating a national very high-resolution (VHR) land cover product is crucial for various applications, including environmental monitoring and urban planning. However, creating such a product often requires a large amount of labeled data over a target area, which can be expensive and challenging. In tackling these challenges, this work introduces a comparative analysis of three label-free techniques, including source-domain pretraining, pseudo-labels, and unsupervised domain adaptation (UDA), for developing the French national VHR land cover product. Three label-free techniques leverage the recent OpenEarthMap datasets and employ an advanced segmentation model, a fully Transformer-based network (FT-UNetFormer). The evaluation of these methods utilized the reference offered by the French datasets: FLAIR. Results indicated an overall product accuracy ranging from 82.1% to 85.5%, with a mean intersection over union (mIoU) fluctuating between 57% and 59%. Notably, the highest accuracy was achieved for buildings, while the lowest accuracy was obtained for bareland. Among the three methods, source-domain pretraining demonstrated adequacy but yielded lower accuracy. UDA exhibited very high accuracy; however, it came with considerable computational complexity. The pseudo-labels methods were identified as a viable trade-off between accuracy and computational efficiency. Ultimately, we will release the products derived from the three label-free techniques. The open availability of these products can contribute significantly to informed decision-making and sustainable development across various sectors.

D. Wang, M. Hu, Y. Jin, Y. Miao, J. Yang, Y. Xu, X. Qin, J. Ma, L. Sun, C. Li, C. Fu, H. Chen, C. Han, N. Yokoya, J. Zhang, M. Xu, L. Liu, L. Zhang, C. Wu, B. Du, D. Tao, and L. Zhang, ”HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 8, pp. 6427–6444, 2025.
PDF Quick Abstract
Abstract: Accurate hyperspectral image (HSI) interpretation is critical for providing valuable insights into various earth observation-related applications such as urban planning, precision agriculture, and environmental monitoring. However, existing HSI processing methods are predominantly task-specific and scene-dependent, which severely limits their ability to transfer knowledge across tasks and scenes, thereby reducing the practicality in real-world applications. To address these challenges, we present HyperSIGMA, a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes, scalable to over one billion parameters. To overcome the spectral and spatial redundancy inherent in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of HyperSIGMA. HyperSIGMA integrates spatial and spectral features using a specially designed spectral enhancement module. In addition, we construct a large-scale hyperspectral dataset, HyperGlobal-450 K, for pre-training, which contains about 450 K hyperspectral images, significantly surpassing existing datasets in scale. Extensive experiments on various high-level and low-level HSI tasks demonstrate HyperSIGMA's versatility and superior representational capability compared to current state-of-the-art methods. Moreover, HyperSIGMA shows significant advantages in scalability, robustness, cross-modal transferring capability, real-world applicability, and computational efficiency.

L. Ding, D. Hong, M. Zhao, H. Chen, C. Li, J. Deng, N. Yokoya, L. Bruzzone, and J. Chanussot, ”A survey of sample-efficient deep learning for change detection in remote sensing: tasks, strategies, and challenges,” IEEE Geoscience and Remote Sensing Magazine, 2025.
PDF Quick Abstract
Abstract: In the context of Earth observation, the trade-off between spatial, spectral, and temporal resolution often limits the versatility of remote sensing images in many important applications. In response, this paper introduces a novel deep learning diffusion model, specifically tailored to improve the spatial resolution of the optical products acquired by the Sentinel-3 (S3) satellite. Our framework employs a diffusion probabilistic model, benefiting from the higher spatial resolution of the Sentinel-2 satellite during training via a new multi-modal loss formulation. This ensures consistency with the original S3 images while enhancing the spatial details. Two distinct conditional low-resolution encoders were experimented with, providing insights into their respective contributions to the diffusion process. The efficacy of the proposed model is demonstrated through extensive ablation studies and comparisons with state-of-the-art methods, using both synthetic and real S3 products. The findings indicate that our model successfully improves spatial resolution while maintaining the integrity of the spectral information, contributing to the field of remote sensing single-image super-resolution.

D. Ibañez, R. Fernandez-Beltran, F. Pla, N. Yokoya, and J. Xia, ”Multi-modal consistent loss diffusion model for Sentinel-3 single image super resolution,” Neural Computing and Applications, vol. 37, pp. 7121–7143, 2025.
PDF Quick Abstract
Abstract: In the context of Earth observation, the trade-off between spatial, spectral, and temporal resolution often limits the versatility of remote sensing images in many important applications. In response, this paper introduces a novel deep learning diffusion model, specifically tailored to improve the spatial resolution of the optical products acquired by the Sentinel-3 (S3) satellite. Our framework employs a diffusion probabilistic model, benefiting from the higher spatial resolution of the Sentinel-2 satellite during training via a new multi-modal loss formulation. This ensures consistency with the original S3 images while enhancing the spatial details. Two distinct conditional low-resolution encoders were experimented with, providing insights into their respective contributions to the diffusion process. The efficacy of the proposed model is demonstrated through extensive ablation studies and comparisons with state-of-the-art methods, using both synthetic and real S3 products. The findings indicate that our model successfully improves spatial resolution while maintaining the integrity of the spectral information, contributing to the field of remote sensing single-image super-resolution.

C. Broni-Bediako, J. Xia, J. Song, H. Chen, M. Siam, and N. Yokoya, ”Generalized few-shot semantic segmentation in remote sensing: Challenge and benchmark,” IEEE Geoscience and Remote Sensing Letter, 2024.
PDF Quick Abstract
Abstract: Learning with limited labelled data is a challenging problem in various applications, including remote sensing. Few-shot semantic segmentation is one approach that can encourage deep learning models to learn from few labelled examples for novel classes not seen during the training. The generalized few-shot segmentation setting has an additional challenge which encourages models not only to adapt to the novel classes but also to maintain strong performance on the training base classes. While previous datasets and benchmarks discussed the few-shot segmentation setting in remote sensing, we are the first to propose a generalized few-shot segmentation benchmark for remote sensing. The generalized setting is more realistic and challenging, which necessitates exploring it within the remote sensing context. We release the dataset augmenting OpenEarthMap with additional classes labelled for the generalized few-shot evaluation setting. The dataset is released during the OpenEarthMap land cover mapping generalized few-shot challenge in the L3D-IVU workshop in conjunction with CVPR 2024. In this work, we summarize the dataset and challenge details in addition to providing the benchmark results on the two phases of the challenge for the validation and test sets.

X. Zhang, N. Yokoya, X. Gu, Q. Tian, and L. Bruzzone ”Local-to-global cross-modal attention-aware fusion for HSI-X semantic segmentation,” IEEE Transactions on Geoscience and Remote Sensing, 2024.
PDF Quick Abstract
Abstract: Hyperspectral image (HSI) classification has recently reached its performance bottleneck. Multimodal data fusion is emerging as a promising approach to overcome this bottleneck by providing rich complementary information from the supplementary modality (X-modality). However, achieving comprehensive cross-modal interaction and fusion that can be generalized across different sensing modalities is challenging due to the disparity in imaging sensors, resolution, and content of different modalities. In this study, we propose a Local-to-Global Cross-modal Attention-aware Fusion (LoGoCAF) framework for HSI-X segmentation. LoGoCAF adopts a two-branch semantic segmentation architecture to learn information from HSI and X modalities. The pipeline of LoGoCAF consists of a local-to-global encoder and a lightweight all multilayer perceptron (ALL-MLP) decoder. In the encoder, convolutions are used to encode local and high-resolution fine details in shallow layers, while transformers are used to integrate global and low-resolution coarse features in deeper layers. The ALL-MLP decoder aggregates information from the encoder for feature fusion and prediction. In particular, two cross-modality modules, the feature enhancement module (FEM) and the feature interaction and fusion module (FIFM), are introduced in each encoder stage. The FEM is used to enhance complementary information by combining information from the other modality across direction-aware, position-sensitive, and channel-wise dimensions. With the enhanced features, the FIFM is designed to promote cross-modality information interaction and fusion for the final semantic prediction. Extensive experiments demonstrate that our LoGoCAF achieves superior performance and generalizes well on various multimodal datasets. The code will be made publicly available.

C.I. Cira, M.Á. Manso-Callejo, N. Yokoya, T. Sălăgean, and A.C. Badea, ”Impact of tile size and tile overlap on the prediction performance of convolutional neural networks trained for road classification,” Remote Sensing, vol. 16, no. 15, 2024.
PDF Quick Abstract
Abstract: Popular geo-computer vision works make use of aerial imagery, with sizes ranging from 64 × 64 to 1024 × 1024 pixels without any overlap, although the learning process of deep learning models can be affected by the reduced semantic context or the lack of information near the image boundaries. In this work, the impact of three tile sizes (256 × 256, 512 × 512, and 1024 × 1024 pixels) and two overlap levels (no overlap and 12.5% overlap) on the performance of road classification models was statistically evaluated. For this, two convolutional neural networks used in various tasks of geospatial object extraction were trained (using the same hyperparameters) on a large dataset (containing aerial image data covering 8650 km2 of the Spanish territory that was labelled with binary road information) under twelve different scenarios, with each scenario featuring a different combination of tile size and overlap. To assess their generalisation capacity, the performance of all resulting models was evaluated on data from novel areas covering approximately 825 km2. The performance metrics obtained were analysed using appropriate descriptive and inferential statistical techniques to evaluate the impact of distinct levels of the fixed factors (tile size, tile overlap, and neural network architecture) on them. Statistical tests were applied to study the main and interaction effects of the fixed factors on the performance. A significance level of 0.05 was applied to all the null hypothesis tests. The results were highly significant for the main effects (p-values lower than 0.001), while the two-way and three-way interaction effects among them had different levels of significance. The results indicate that the training of road classification models on images with a higher tile size (more semantic context) and a higher amount of tile overlap (additional border context and continuity) significantly impacts their performance. The best model was trained on a dataset featuring tiles with a size of 1024 × 1024 pixels and a 12.5% overlap, and achieved a loss value of 0.0984, an F1 score of 0.8728, and an ROC-AUC score of 0.9766, together with an error rate of 3.5% on the test set.

W. He, Z. Wu, N. Yokoya, and X. Yuan, ”An interpretable and flexible fusion prior to boost hyperspectral imaging reconstruction,” Information Fusion, vol. 111, 102528, 2024.
PDF Quick Abstract
Abstract: Hyperspectral image (HSI) reconstruction from the compressed measurement captured by the coded aperture snapshot spectral imager system remains a hot topic. Recently, deep-learning-based methods for HSI reconstruction have become the mainstream due to their high performance and efficiency in the testing inference. However, these learning methods do not fully utilize the abundant spectral information with proper physical spectral priors, resulting in complex architectures and unsatisfactory reconstruction performance. In this paper, we claim that the spectral low-rank property can still help these learning methods and propose a hyperspectral fusion theory, which demonstrates that full HSIs are mathematically equivalent to the closed-form combination of subspace images, mask, and measurements. Based on the above fusion theory, we propose the subspace distillation prior (SP) to efficiently cooperate with existing learning models to enhance the exploration of the spectral low-rank property. In detail, the SP can directly improve the testing inference of existing models (SP1, Section 4.1). Furthermore, SP can also be cooperated with exiting networks to formulate a new framework, which regularizes the existing models to learn the subspace images, and help to reconstruct the full HSIs from subspace images, mask, and measurements (SP2, Section 4.2). We choose six existing representative models for the HSI reconstruction experiments and find that SP1 and SP2 can, respectively, achieve improvements of 0.08 dB~0.76 dB and 0.36 dB~1.76 dB on the simulated datasets, demonstrating the advantage of the proposed hyperspectral fusion theory. The source code is available at https://github.com/prowDIY.

H. Chen, J. Song, C. Han, J. Xia, and N. Yokoya, ”ChangeMamba: Remote sensing change detection with spatio-temporal state space model,” IEEE Transactions on Geoscience and Remote Sensing, 2024.
PDF Quick Abstract
Abstract: Convolutional neural networks (CNN) and Transformers have made impressive progress in the field of remote sensing change detection (CD). However, both architectures have inherent shortcomings. Recently, the Mamba architecture, based on state space models, has shown remarkable performance in a series of natural language processing tasks, which can effectively compensate for the shortcomings of the above two architectures. In this paper, we explore for the first time the potential of the Mamba architecture for remote sensing CD tasks. We tailor the corresponding frameworks, called MambaBCD, MambaSCD, and MambaBDA, for binary change detection (BCD), semantic change detection (SCD), and building damage assessment (BDA), respectively. All three frameworks adopt the cutting-edge Visual Mamba architecture as the encoder, which allows full learning of global spatial contextual information from the input images. For the change decoder, which is available in all three architectures, we propose three spatio-temporal relationship modeling mechanisms, which can be naturally combined with the Mamba architecture and fully utilize its attribute to achieve spatio-temporal interaction of multi-temporal features, thereby obtaining accurate change information. On five benchmark datasets, our proposed frameworks outperform current CNN- and Transformer-based approaches without using any complex training strategies or tricks, fully demonstrating the potential of the Mamba architecture in CD tasks. Specifically, we obtained 83.11%, 88.39% and 94.19% F1 scores on the three BCD datasets SYSU, LEVIR-CD+, and WHU-CD; on the SCD dataset SECOND, we obtained 24.11% SeK; and on the BDA dataset xBD, we obtained 81.41% overall F1 score. Further experiments show that our architecture is quite robust to degraded data. The source code will be available in this https URL.

H. Chen, C. Lan, J. Song, C. Broni-Bediako, J. Xia, and N. Yokoya, ”Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer,” IEEE Transactions on Geoscience and Remote Sensing, 2024.
PDF Quick Abstract
Abstract: Optical high-resolution imagery and OpenStreetMap (OSM) data are two important data sources of land-cover change detection. Previous studies in these two data sources focus on utilizing the information in OSM data to aid the change detection on multi-temporal optical high-resolution images. This paper pioneers the direct detection of land-cover changes utilizing paired OSM data and optical imagery, thereby broadening the horizons of change detection tasks to encompass more dynamic earth observations. To this end, we propose an object-guided Transformer (ObjFormer) architecture by naturally combining the prevalent object-based image analysis (OBIA) technique with the advanced vision Transformer architecture. The introduction of OBIA can significantly reduce the computational overhead and memory burden in the self-attention module without adding any extra network parameters or layers. Specifically, the proposed ObjFormer has a hierarchical pseudo-siamese encoder consisting of object-guided self-attention modules that extract representative features with different levels from OSM data and optical images; a decoder consisting of object-guided cross-attention modules can progressively recover the land-cover changes from the extracted heterogeneous features. In addition to the basic supervised binary change detection task, this paper raises a new semi-supervised semantic change detection task that does not require any manually annotated land-cover labels to train semantic change detectors. Two lightweight semantic decoders are added to ObjFormer to accomplish this task efficiently. A converse cross-entropy (CCE) loss is designed to fully utilize the negative samples, thereby contributing to the great performance improvement in this task. The first large-scale benchmark dataset containing 1,287 map-image pairs (1024$\times$ 1024 pixels for each sample) covering 40 regions on six continents is constructed to conduct detailed experiments, including benchmark comparison, ablation studies, hyperparameter discussions, experiments delving into object-guided self-attention and CCE loss, and the model's robustness to registration errors. These results show the effectiveness and superiority of the proposed methods in this new kind of change detection task. Additionally, study sites covering two important cities in Japan are selected to verify the generalizability of the proposed framework and show its potential in practical applications, such as large-scale land-cover mapping, semantic change analysis, and geographic information data updating.

W. Gan, N. Mo, H. Xu, and N. Yokoya, ”A comprehensive framework for 3D occupancy estimation in autonomous driving,” IEEE Transactions on Intelligent Vehicles, 2024.
PDF Quick Abstract
Abstract: The task of estimating 3D occupancy from surrounding view images is an exciting development in the field of autonomous driving, following the success of Birds Eye View (BEV) perception.This task provides crucial 3D attributes of the driving environment, enhancing the overall understanding and perception of the surrounding space. However, there is still a lack of a baseline to define the task, such as network design, optimization, and evaluation. In this work, we present a simple attempt for 3D occupancy estimation, which is a CNN-based framework designed to reveal several key factors for 3D occupancy estimation. In addition, we explore the relationship between 3D occupancy estimation and other related tasks, such as monocular depth estimation, stereo matching, and BEV perception (3D object detection and map segmentation), which could advance the study on 3D occupancy estimation. For evaluation, we propose a simple sampling strategy to define the metric for occupancy evaluation, which is flexible for current public datasets. Moreover, we establish a new benchmark in terms of the depth estimation metric, where we compare our proposed method with monocular depth estimation methods on the DDAD and Nuscenes datasets.The relevant code will be available in this https URL.

X. Ding, J. Kang, Y. Bai, A. Zhang, J. Liu, and N. Yokoya, ”Towards robustness and efficiency of coherence-guided complex convolutional sparse coding for interferometric phase restoration,” IEEE Transactions on Computational Imaging, 2024.
PDF Quick Abstract
Abstract: Recently, complex convolutional sparse coding (ComCSC) has demonstrated its effectiveness in interferometric phase restoration, owing to its prominent performance in noise mitigation and detailed phase preservation. By incorporating the estimated coherence into ComCSC as prior knowledge for re-weighting individual complex residues, coherence-guided complex convolutional sparse coding (CoComCSC) further improves the quality of restored phases, especially over heterogeneous land-covers with rapidly varying coherence. However, due to the exploited L2 norm of the data fidelity term, the original CoComCSC is not robust to outliers when relatively low coherence values are sparsely distributed over high ones. We propose CoComCSC-L1 and CoComCSC-Huber to improve the robustness of CoComCSC based on the L1 and Huber norms. Moreover, we propose an efficient solver to decrease the computational cost of solving the linear system subproblem within ComCSC-based optimization problems. By comparing the proposed methods to other state-of-the-art methods using both simulated and real data, the proposed methods demonstrate their effectiveness. Additionally, the proposed solver has the potential to improve optimization speed by approximately 10% compared to the state-of-the-art solver.

T. Xu, T.Z. Huang, L.J. Deng, J.L. Xiao, C. Broni-Bediako, J. Xia, and N. Yokoya, ”A coupled tensor double-factor method for hyperspectral and multispectral image fusion,” IEEE Transactions on Geoscience and Remote Sensing, 2024.
PDF Quick Abstract
Abstract: Hyperspectral image (HSI) and multispectral image (MSI) fusion, denoted as HSI-MSI fusion, involves merging a pair of HSI and MSI to generate a high spatial resolution HSI (HR-HSI). The primary challenge in HSI-MSI fusion is to find the best way to extract 1-D spectral features and 2-D spatial features from HSI and MSI and harmoniously combine them. In recent times, coupled tensor decomposition (CTD)-based methods have shown promising performance in the fusion task. However, the tensor decompositions (TDs) used by these CTD-based methods face difficulties in extracting complex features and capturing 2-D spatial features, resulting in suboptimal fusion results. To address these issues, we introduce a novel method called coupled tensor double-factor (CTDF) decomposition. Specifically, we propose a tensor double-factor (TDF) decomposition, representing a third-order HR-HSI as a fourth-order spatial factor and a third-order spectral factor, connected through a tensor contraction. Compared to other TDs, the TDF has better feature extraction capability since it has a higher order factor than that of HR-HSI, whereas the other TDs only have the same order factor as the HR-HSI. Moreover, the TDF can extract 2-D spatial features using the fourth-order spatial factor. We apply the TDF to the HSI-MSI fusion problem and formulate the CTDF model. Furthermore, we design an algorithm based on proximal alternating minimization (PAM) to solve this model and provide insights into its computational complexity and convergence analysis. The simulated and real experiments validate the effectiveness and efficiency of the proposed CTDF method. The code is available at https://github.com/tingxu113/CTDF.

D. Hong, B. Zhang, X. Li, Y. Li, C. Li, J. Yao, N. Yokoya, H. Li, P. Ghamisi, X. Jia, A. Plaza, G. Paolo, J. A. Benediktsson, J. Chanussot, ”SpectralGPT: Spectral remote sensing foundation model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024.
PDF Quick Abstract
Abstract: The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner. While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications. To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT). Compared to existing foundation models, SpectralGPT 1) accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data; 2) leverages 3D token generation for spatial-spectral coupling; 3) captures spectrally sequential patterns via multi-target reconstruction; 4) trains on one million spectral RS images, yielding models with over 600 million parameters. Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience across four downstream tasks: single/multi-label scene classification, semantic segmentation, and change detection.

N. Yokoya, J. Xia, and C. Broni-Bediako, ”Submeter-level land cover mapping of Japan,” International Journal of Applied Earth Observation and Geoinformation, vol. 127, p. 103660, 2024.
PDF Quick Abstract
Abstract: Deep learning has shown promising performance in submeter-level mapping tasks; however, its annotation cost remains a challenge, especially when applied on a large scale. In this paper, we introduce the first submeter-level land cover mapping of Japan, employing eight classes. We present a human-in-the-loop framework that achieves national-scale mapping with a small amount of additional labeled data together with OpenEarthMap, a recently introduced benchmark dataset for global submeter-level land cover mapping. Using aerial imagery provided by the Geospatial Information Authority of Japan, we create land cover classification maps for the entire country of Japan and evaluate their accuracy. By adding a small amount of labeled data to areas where a U-Net model trained on OpenEarthMap clearly failed and retraining the model, an overall accuracy of 80% was achieved, which is a nearly 16 percentage point improvement after retraining. Our framework, with its low-cost and high-accuracy mapping results, demonstrates the potential to contribute to the automatic updating of land cover maps using submeter-level optical remote sensing data. The mapping results will be made publicly available.

S.A. Ahmadi, A. Mohammadzadeh, N. Yokoya, and A. Ghorbanian, ”BD-SKUNet: Selective-kernel UNets for building damage assessment in high-resolution satellite images,” Remote Sensing, vol. 16, no. 1, p. 182, 2024.
PDF Quick Abstract
Abstract: When natural disasters occur, timely and accurate building damage assessment maps are vital for disaster management responders to organize their resources efficiently. Pairs of pre- and post-disaster remote sensing imagery have been recognized as invaluable data sources that provide useful information for building damage identification. Recently, deep learning-based semantic segmentation models have been widely and successfully applied to remote sensing imagery for building damage assessment tasks. In this study, a two-stage, dual-branch, UNet architecture, with shared weights between two branches, is proposed to address the inaccuracies in building footprint localization and per-building damage level classification. A newly introduced selective kernel module improves the performance of the model by enhancing the extracted features and applying adaptive receptive field variations. The xBD dataset is used to train, validate, and test the proposed model based on widely used evaluation metrics such as F1-score and Intersection over Union (IoU). Overall, the experiments and comparisons demonstrate the superior performance of the proposed model. In addition, the results are further confirmed by evaluating the geographical transferability of the proposed model on a completely unseen dataset from a new region (Bam city earthquake in 2003).

R. Iizuka, J. Xia, and N. Yokoya, ”Frequency-based optimal style mix for domain generalization in semantic segmentation of remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1-14, 2024.
PDF Quick Abstract
Abstract: Supervised learning methods assume that training and test data are sampled from the same distribution. However, this assumption is not always satisfied in practical situations of land cover semantic segmentation when models trained in a particular source domain are applied to other regions. This is because domain shifts caused by variations in location, time, and sensor alter the distribution of images in the target domain from that of the source domain, resulting in significant degradation of model performance. To mitigate this limitation, domain generalization (DG) has gained attention as a way of generalizing from source domain features to unseen target domains. One approach is style randomization (SR), which enables models to learn domain-invariant features through randomizing styles of images in the source domain. Despite its potential, existing methods face several challenges, such as inflexible frequency decomposition, high computational and data preparation demands, slow speed of randomization, and lack of consistency in learning. To address these limitations, we propose a frequency-based optimal style mix (FOSMix), which consists of three components: 1) full mix (FM) enhances the data space by maximally mixing the style of reference images into the source domain; 2) optimal mix (OM) keeps the essential frequencies for segmentation and randomizes others to promote generalization; and 3) regularization of consistency ensures that the model can stably learn different images with the same semantics. Extensive experiments that require the model’s generalization ability, with domain shift caused by variations in regions and resolutions, demonstrate that the proposed method achieves superior segmentation in remote sensing. The source code is available at https://github.com/Reo-I/FOSMix.

H. Chen, J. Song, C. Wu, B. Du, and N. Yokoya, ”Exchange means change: an unsupervised single-temporal change detection framework based on intra-and inter-image patch exchange,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 206, pp. 87-105, 2023.
PDF Quick Abstract
Abstract: Change detection is a critical task in studying the dynamics of ecosystems and human activities using multi-temporal remote sensing images. While deep learning has shown promising results in change detection tasks, it requires a large number of labeled and paired multi-temporal images to achieve high performance. Pairing and annotating large-scale multi-temporal remote sensing images is both expensive and time-consuming. To make deep learning-based change detection techniques more practical and cost-effective, we propose an unsupervised single-temporal change detection framework based on intra- and inter-image patch exchange (I3PE). The I3PE framework allows for training deep change detectors on unpaired and unlabeled single-temporal remote sensing images that are readily available in real-world applications. The I3PE framework comprises four steps: 1) intra-image patch exchange method is based on an object-based image analysis (OBIA) method and adaptive clustering algorithm, which generates pseudo-bi-temporal image pairs and corresponding change labels from single-temporal images by exchanging patches within the image; 2) inter-image patch exchange method can generate more types of land-cover changes by exchanging patches between images; 3) a simulation pipeline consisting of several image enhancement methods is proposed to simulate the radiometric difference between pre- and post-event images caused by different imaging conditions in real situations; 4) self-supervised learning based on pseudo-labels is applied to further improve the performance of the change detectors in both unsupervised and semi-supervised cases. Extensive experiments on two large-scale datasets covering Hongkong, Shanghai, Hangzhou, and Chengdu, China, demonstrate that I3PE outperforms representative unsupervised approaches and achieves F1 value improvements of 10.65% and 6.99% to the state-of-the-art method. Moreover, I3PE can improve the performance of the change detector by 4.37% and 2.61% on F1 values in the case of semi-supervised settings. Additional experiments on a dataset covering a study area with 144 km^2 in Wuhan, China, confirm the effectiveness of I3PE for practical land-cover change analysis tasks.

Y. Bai, J. Kang, X. Ding, A. Zhang, Z. Zhang, and N. Yokoya, ”LaMIE: Large-dimensional multipass InSAR phase estimation for distributed scatterers,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-15, 2023.
PDF Quick Abstract
Abstract: State-of-the-art (SOTA) phase linking (PL) methods for distributed scatterer (DS) interferometry (DSI) retrieve consistent phase histories from the sample coherence matrix or the one whose magnitudes are calibrated. To unify them, we first propose a framework consisting of sample coherence matrix estimation and Kullback–Leibler (KL) divergence minimization. Within such framework, we observe that the current SOTA PL methods mainly focus on calibrating the magnitudes of sample coherence matrix while ignoring the errors caused by it exploited in the complex domain, especially when the PL problem is large-dimensional. In this article, “large-dimensional” refers to the case where the temporal dimension N of coherence matrices and the number P of statistically homogeneous pixels (SHPs) are at the same level. To solve this issue, we further propose a PL method, termed LaMIE, which is aimed at precise phase history retrieval from large-dimensional coherence matrices for DSI. It includes two steps: 1) sample coherence matrix shrinkage to calibrate the matrix in complex and real domains and 2) phase history retrieval via the flat coherence metric. Both simulated and real data experiments validate the effectiveness of the proposed method by comparing it with other PL methods. Through LaMIE, the densities of the selected points with stable phases can be significantly improved, and the displacement velocities for more regions can be obtained than with SOTA methods.

M. Dalponte, Y.T. Solano-Correa, D. Marinelli, S. Liu, N. Yokoya, D. Gianelle, ”Detection of forest windthrows with bitemporal COSMO-SkyMed and Sentinel-1 SAR data,” Remote Sensing of Environment, vol. 297, p. 113787, 2023.
PDF Quick Abstract
Abstract: Wind represents a primary source of disturbances in forests, necessitating an assessment of the resulting damage to ensure appropriate forest management. Remote sensing, encompassing both active and passive techniques, offers a valuable and efficient approach for this purpose, enabling coverage of large areas while being cost-effective. Passive remote sensing data could be affected by the presence of clouds, unlike active systems such as Synthetic Aperture Radar (SAR) which are relatively less affected. Therefore, this study aims to explore the utilization of bitemporal SAR data for windthrow detection in mountainous regions. Specifically, we investigated how the detection outcomes vary based on three factors: i) the SAR wavelength (X-band or C-band), ii) the acquisition period of the pre- and post-event images (summer, autumn, or winter), and iii) the forest type (evergreen vs. deciduous). Our analysis considers two SAR satellite constellations: COSMO-SkyMed (band-X, with a pixel spacing of 2.5 m and 10 m) and Sentinel-1 (band-C, with a pixel spacing of 10 m). We focused on three study sites located in the Trentino-South Tyrol region of Italy, which experienced significant forest damage during the Vaia storm from 27th to 30th October 2018. To accomplish our objectives, we employed a detail-preserving, scale-driven approach for change detection in bitemporal SAR data. The results demonstrate that: i) the algorithm exhibits notably better performance when utilizing X-band data, achieving a highest kappa accuracy of 0.473 and a balanced accuracy of 76.1%; ii) the pixel spacing has an influence on the accuracy, with COSMO-SkyMed data achieving kappa values of 0.473 and 0.394 at pixel spacings of 2.5 m and 10 m, respectively; iii) the post-event image acquisition season significantly affects the algorithm's performance, with summer imagery yielding superior results compared to winter imagery; and iv) the forest type (evergreen vs. deciduous) has a noticeable impact on the results, particularly when considering autumn/winter data.

C. Broni-Bediako, J. Xia, and N. Yokoya, ”Real-time semantic segmentation: A brief survey and comparative study in remote sensing,” IEEE Geoscience and Remote Sensing Magazine, pp. 2–33, early access, 2023.
PDF Quick Abstract
Abstract: Real-time semantic segmentation of remote sensing imagery is a challenging task that requires a tradeoff between effectiveness and efficiency. It has many applications, including tracking forest fires, detecting changes in land use and land cover, crop health monitoring, and so on. With the success of efficient deep learning methods [i.e., efficient deep neural networks (DNNs)] for real-time semantic segmentation in computer vision, researchers have adopted these efficient DNNs in remote sensing image analysis. This article begins with a summary of the fundamental compression methods for designing efficient DNNs and provides a brief but comprehensive survey, outlining the recent developments in real-time semantic segmentation of remote sensing imagery. We examine several seminal efficient deep learning methods, placing them in a taxonomy based on the network architecture design approach. Furthermore, we evaluate the quality and efficiency of some existing efficient DNNs on a publicly available remote sensing semantic segmentation benchmark dataset, OpenEarthMap. The experimental results of an extensive comparative study demonstrate that most of the existing efficient DNNs have good segmentation quality, but they suffer low inference speed (i.e., a high latency rate), which may limit their capability of deployment in real-time applications of remote sensing image segmentation. We provide some insights into the current trend and future research directions for real-time semantic segmentation of remote sensing imagery.

D. Hong, J. Yao, C. Li, D. Meng, N. Yokoya, and J. Chanussot, ”Decoupled-and-coupled networks: Self-supervised hyperspectral image super-resolution with subpixel fusion,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-12, 2023.
PDF Quick Abstract
Abstract: Enormous efforts have been recently made to super-resolve hyperspectral (HS) images with the aid of high spatial resolution multispectral (MS) images. Most prior works usually perform the fusion task by means of multifarious pixel-level priors. Yet, the intrinsic effects of a large distribution gap between HS–MS data due to differences in the spatial and spectral resolution are less investigated. The gap might be caused by unknown sensor-specific properties or highly mixed spectral information within one pixel (due to low spatial resolution). To this end, we propose a subpixel-level HS super-resolution (HS-SR) framework by devising a novel decoupled-and-coupled network (DC-Net), to progressively fuse HS–MS information from the pixel level to subpixel level and from the image level to feature level. As the name suggests, DC-Net first decouples the input into common (or cross-sensor) and sensor-specific components to eliminate the gap between HS–MS images before further fusion and then thoroughly blends them by a model-guided coupled spectral unmixing (CSU) net. More significantly, we append a self-supervised learning module behind the CSU net by guaranteeing material consistency to enhance the detailed appearance of the restored HS product. Extensive experimental results show the superiority of our method both visually and quantitatively and achieve a significant improvement in comparison with the state of the art (SOTA).

W. Gan, H. Xu, Y. Huang, S. Chen, and N. Yokoya, ”V4D: Voxel for 4D novel view synthesis,” IEEE Transactions on Visualization and Computer Graphics, 2023.
PDF Quick Abstract
Abstract: Neural radiance fields have made a remarkable breakthrough in the novel view synthesis task at the 3D static scene. However, for the 4D circumstance (e.g., dynamic scene), the performance of the existing method is still limited by the capacity of the neural network, typically in a multilayer perceptron network (MLP). In this paper, we utilize 3D Voxel to model the 4D neural radiance field, short as V4D, where the 3D voxel has two formats. The first one is to regularly model the 3D space and then use the sampled local 3D feature with the time index to model the density field and the texture field by a tiny MLP. The second one is in look-up tables (LUTs) format that is for the pixel-level refinement, where the pseudo-surface produced by the volume rendering is utilized as the guidance information to learn a 2D pixel-level refinement mapping. The proposed LUTs-based refinement module achieves the performance gain with little computational cost and could serve as the plug-and-play module in the novel view synthesis task. Moreover, we propose a more effective conditional positional encoding toward the 4D data that achieves performance gain with negligible computational burdens. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance at a low computational cost. The relevant code is available in https://github.com/GANWANSHUI/V4D.

M. Dalponte, Y. T. Solano-Correa, D. Marinelli, S. Liu, N. Yokoya, D. Gianelle, ”Detection of forest windthrows with bitemporal COSMO-SkyMed and Sentinel-1 SAR data,” Remote Sensing of Environment, 2023.
PDF Quick Abstract
Abstract: Wind represents a primary source of disturbances in forests, necessitating an assessment of the resulting damage to ensure appropriate forest management. Remote sensing, encompassing both active and passive techniques, offers a valuable and efficient approach for this purpose, enabling coverage of large areas while being cost-effective. Passive remote sensing data could be affected by the presence of clouds, unlike active systems such as Synthetic Aperture Radar (SAR) which are relatively less affected. Therefore, this study aims to explore the utilization of bitemporal SAR data for windthrow detection in mountainous regions. Specifically, we investigated how the detection outcomes vary based on three factors: i) the SAR wavelength (X-band or C-band), ii) the acquisition period of the pre- and post-event images (summer, autumn, or winter), and iii) the forest type (evergreen vs. deciduous). Our analysis considers two SAR satellite constellations: COSMO-SkyMed (band-X, with a pixel spacing of 2.5 m and 10 m) and Sentinel-1 (band-C, with a pixel spacing of 10 m). We focused on three study sites located in the Trentino-South Tyrol region of Italy, which experienced significant forest damage during the Vaia storm from 27th to 30th October 2018. To accomplish our objectives, we employed a detail-preserving, scale-driven approach for change detection in bitemporal SAR data. The results demonstrate that: i) the algorithm exhibits notably better performance when utilizing X-band data, achieving a highest kappa accuracy of 0.473 and a balanced accuracy of 76.1%; ii) the pixel spacing has an influence on the accuracy, with COSMO-SkyMed data achieving kappa values of 0.473 and 0.394 at pixel spacings of 2.5 m and 10 m, respectively; iii) the post-event image acquisition season significantly affects the algorithm's performance, with summer imagery yielding superior results compared to winter imagery; and iv) the forest type (evergreen vs. deciduous) has a noticeable impact on the results, particularly when considering autumn/winter data.

T. Xu, T.-Z. Huang, L.-J. Deng, H.-X. Dou, and N. Yokoya, ”TR-STF: a fast and accurate tensor ring decomposition algorithm via defined scaled tri-factorization,” Computational and Applied Mathematics, vol. 42, p. 234, 2023.
PDF Quick Abstract
Abstract: This paper proposes an algorithm based on defined scaled tri-factorization (STF) for fast and accurate tensor ring (TR) decomposition. First, based on the fast tri-factorization approach, we define STF and design a corresponding algorithm that can more accurately represent various matrices while maintaining a similar level of computational time. Second, we apply sequential STFs to TR decomposition with theoretical proof and propose a stable (i.e., non-iterative) algorithm named TR-STF. It is a computationally more efficient algorithm than existing TR decomposition algorithms, which is beneficial when dealing with big data. Experiments on multiple randomly simulated data, highly oscillatory functions, and real-world data sets verify the effectiveness and high efficiency of the proposed TR-STF. For example, on the Pavia University data set, TR-STF is nearly 9240 and 39 times faster, respectively, and more accurate than algorithms based on alternating least squares and singular value decomposition. As an extension, we apply sequential STFs to tensor train (TT) decomposition and propose

H. Chen, N. Yokoya, and M. Chini, ”Fourier domain structural relationship analysis for unsupervised multimodal change detection,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 198, pp. 99-114, 2023.
PDF Quick Abstract
Abstract: Change detection on multimodal remote sensing images has become an increasingly interesting and challenging topic in the remote sensing community, which can play an essential role in time-sensitive applications, such as disaster response. However, the modal heterogeneity problem makes it difficult to compare the multimodal images directly. This paper proposes a Fourier domain structural relationship analysis framework for unsupervised multimodal change detection (FD-MCD), which exploits both modality-independent local and nonlocal structural relationships. Unlike most existing methods analyzing the structural relationship in the original domain of multimodal images, the three critical parts in the proposed framework are implemented on the (graph) Fourier domain. Firstly, a local frequency consistency metric calculated in the Fourier domain is proposed to determine the local structural difference. Then, the nonlocal structural relationship graphs are constructed for pre-change and post-change images. The two graphs are then transformed to the graph Fourier domain, and high-order vertex information is modeled for each vertex by graph spectral convolution, where the Chebyshev polynomial is applied as the transfer function to pass K-hop local neighborhood vertex information. The nonlocal structural difference map is obtained by comparing the filtered graph representations. Finally, an adaptive fusion method based on frequency-decoupling is designed to effectively fuse the local and nonlocal structural difference maps. Experiments conducted on five real datasets with different modality combinations and change events show the effectiveness of the proposed framework.

X. Sun, D. Yin, F. Qin, H. Yu, W. Lu, F. Yao, Q. He, X. Huang, Z. Yan, P. Wang, C. Deng, N. Liu, Y. Yang, W. Liang, R. Wang, C. Wang, N. Yokoya, R. Hänsch, K. Fu, ”Revealing influencing factors on global waste distribution via deep-learning based dumpsite detection from satellite imagery,” Nature Communications, vol. 14, no. 1444, 2023.
PDF Quick Abstract
Abstract: With the advancement of global civilisation, monitoring and managing dumpsites have become essential parts of environmental governance in various countries. Dumpsite locations are difficult to obtain in a timely manner by local government agencies and environmental groups. The World Bank shows that governments need to spend massive labour and economic costs to collect illegal dumpsites to implement management. Here we show that applying novel deep convolutional networks to high-resolution satellite images can provide an effective, efficient, and low-cost method to detect dumpsites. In sampled areas of 28 cities around the world, our model detects nearly 1000 dumpsites that appeared around 2021. This approach reduces the investigation time by more than 96.8% compared with the manual method. With this novel and powerful methodology, it is now capable of analysing the relationship between dumpsites and various social attributes on a global scale, temporally and spatially.

W. He, T. Uezato, and N. Yokoya, ”Interpretable deep attention prior for image restoration and enhancement,” IEEE Transactions on Computational Imaging, vol. 9, pp. 185-196, 2023.
PDF Quick Abstract
Abstract: An inductive bias induced by an untrained network architecture has been shown to be effective as a deep image prior (DIP) in solving inverse imaging problems. However, it is still unclear as to what kind of prior is encoded in the network architecture, and the early stopping for the overfitting problem of DIP still remains the challenge. To address this, we introduce an interpretable network that explores self-attention as a deep attention prior (DAP). Specifically, the proposed deep attention prior is formulated as an interpretable optimization problem. A nonlocal self-similarity prior is incorporated into the network architecture by a self-attention mechanism. Each attention map from our proposed DAP reveals how an output value is generated, which leads to a better understanding of the prior. Furthermore, compared to DIP, the proposed DAP regards the single input degraded image as input to reduce the instability, and introduces the mask operation to handle the early stopping problem. Experiments show that the proposed network works as an effective image prior for solving different inverse imaging problems, such as denoising, inpainting, or pansharpening, while also showing potential applications in higher-level processing such as interactive segmentation and selective colorization.

J. Xia, N. Yokoya, B. Adriano, and K. Kanemoto, ”National high-resolution cropland classification of Japan with agricultural census information and multi-temporal multi-modality datasets,” International Journal of Applied Earth Observation and Geoinformation, 2023.
PDF Quick Abstract
Abstract: Multi-modality datasets offer advantages for processing frameworks with complementary information, particularly for large-scale cropland mapping. Extensive training datasets are required to train machine learning algorithms, which can be challenging to obtain. To alleviate the limitations, we extract the training samples from the agricultural census information. We focus on Japan and demonstrate how agricultural census data in 2015 can map different crop types for the entire country. Due to the lack of Sentinel-2 datasets in 2015, this study utilized Sentinel-1 and Landsat-8 collected across Japan and combined observations into composites for different prefecture periods (monthly, bimonthly, seasonal). Recent deep learning techniques have been investigated the performance of the samples from agricultural census information. Finally, we obtain nine crop types on a countrywide scale (around 31 million parcels) and compare our results to those obtained from agricultural census testing samples as well as those obtained from recent land cover products in Japan. The generated map accurately represents the distribution of crop types across Japan and achieves an overall accuracy of 87% for nine classes in 47 prefectures. Our findings highlight the importance of using multi-modality data with agricultural census information to evaluate agricultural productivity in Japan. The final products are available at https://doi.org/10.5281/zenodo.7519274.

H. Chen, N. Yokoya, C. Wu and B. Du, ”Unsupervised multimodal change detection based on structural relationship graph representation learning,” IEEE Transactions on Geoscience and Remote Sensing, 2022.
PDF Quick Abstract
Abstract: Unsupervised multimodal change detection is a practical and challenging topic that can play an important role in time-sensitive emergency applications. To address the challenge that multimodal remote sensing images cannot be directly compared due to their modal heterogeneity, we take advantage of two types of modality-independent structural relationships in multimodal images. In particular, we present a structural relationship graph representation learning framework for measuring the similarity of the two structural relationships. Firstly, structural graphs are generated from preprocessed multimodal image pairs by means of an object-based image analysis approach. Then, a structural relationship graph convolutional autoencoder (SR-GCAE) is proposed to learn robust and representative features from graphs. Two loss functions aiming at reconstructing vertex information and edge information are presented to make the learned representations applicable for structural relationship similarity measurement. Subsequently, the similarity levels of two structural relationships are calculated from learned graph representations and two difference images are generated based on the similarity levels. After obtaining the difference images, an adaptive fusion strategy is presented to fuse the two difference images. Finally, a morphological filtering-based postprocessing approach is employed to refine the detection results. Experimental results on six datasets with different modal combinations demonstrate the effectiveness of the proposed method.

X. Ding, J. Kang, Z. Zhang, Y. Huang, J. Liu, and N. Yokoya, ”Coherence-guided complex convolutional sparse coding for interferometric phase restoration,” IEEE Transactions on Geoscience and Remote Sensing, 2022.
PDF Quick Abstract
Abstract: Interferometric phase restoration is a crucial step in retrieving large-scale geophysical parameters from Synthetic Aperture Radar (SAR) images. Existing noise impacts the accuracy of parameter retrieval as a result of decorrelation effects. Most state-of-the-art filtering methods belong to the group of nonlocal filters. In this paper, we propose a novel convolutional sparse coding method in complex domain with the prior knowledge of coherence integrated into the optimization model, which is termed as CoComCSC. CoComCSC is not only capable of reducing noise in regions with continuous phase changes, but also of preserving the phase details prominently. The experiments results on simulated and real data demonstrate the effectiveness of CoComCSC by comparing with other state-of-the-art methods. Moreover, the obtained Digital Elevation Model (DEM) product by CoComCSC from RADARSAT-2 data indicates its superior filtering performance over regions with heterogeneous land-covers, which shows its great potential for generating high-resolution DEM products.

D. Ibañez, R. Fernandez-Beltran, F. Pla, and N. Yokoya, ”Masked auto-encoding spectral-spatial transformer for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-14, 2022.
PDF Code Quick Abstract
Abstract: Deep learning has certainly become the dominant trend in hyper-spectral (HS) remote sensing image classification owing to its excellent capabilities to extract highly discriminating spectral-spatial features. In this context, transformer networks have recently shown prominent results in distinguishing even the most subtle spectral differences because of their potential to characterize sequential spectral data. Nonetheless, many complexities affecting HS remote sensing data (e.g. atmospheric effects, thermal noise, quantization noise, etc.) may severely undermine such potential since no mode of relieving noisy feature patterns has still been developed within transformer networks. To address the problem, this paper presents a novel masked auto-encoding spectral-spatial transformer (MAEST), which gathers two different collaborative branches: (i) a reconstruction path, which dynamically uncovers the most robust encoding features based on a masking auto-encoding strategy; and (ii) a classification path, which embeds these features onto a transformer network to classify the data focusing on the features that better reconstruct the input. Unlike other existing models, this novel design pursues to learn refined transformer features considering the aforementioned complexities of the HS remote sensing image domain. The experimental comparison, including several state-of-the-art methods and benchmark datasets, shows the superior results obtained by MAEST. The codes of this paper will be available at https://github.com/ibanezfd/MAEST.

T. Xu, T.Z. Huang, L.J. Deng, and N. Yokoya, ”An iterative regularization method based on tensor subspace representation for hyperspectral image super-resolution,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-16, 2022.
PDF Code Quick Abstract
Abstract: Hyperspectral image super-resolution (HSI-SR) can be achieved by fusing a paired multispectral image (MSI) and hyperspectral image (HSI), which is a prevalent strategy. But, how to precisely reconstruct the high spatial resolution hyperspectral image (HR-HSI) by fusion technology is a challenging issue. In this article, we propose an iterative regularization method based on tensor subspace representation (IR-TenSR) for MSI-HSI fusion, thus HSI-SR. First, we propose a tensor subspace representation (TenSR)-based regularization model that integrates the global spectral–spatial low-rank and the nonlocal self-similarity priors of HR-HSI. These two priors have been proven effective, but previous HSI-SR works cannot simultaneously exploit them. Subsequently, we design an iterative regularization procedure to utilize the residual information of acquired low-resolution images, which are ignored in other works that produce suboptimal results. Finally, we develop an effective algorithm based on the proximal alternating minimization method to solve the TenSR-regularization model. With that, we obtain the iterative regularization algorithm. Experiments implemented on the simulated and real datasets illustrate the advantages of the proposed IR-TenSR compared with the state-of-the-art fusion approaches. The code is available at https://github.com/liangjiandeng/IR_TenSR.

D. Ibañez, R. Fernandez-Beltran, F. Pla, and N. Yokoya, ”DAT-CNN: Dual attention temporal CNN for time-resolving Sentinel-3 vegetation indices,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 2632-2643, 2022.
PDF Code Quick Abstract
Abstract: The synergies between Sentinel-3 (S3) and the forthcoming fluorescence explorer (FLEX) mission bring us the opportunity of using S3 vegetation indices (VI) as proxies of the solar-induced chlorophyll fluorescence (SIF) that will be captured by FLEX. However, the highly dynamic nature of SIF demands a very temporally accurate monitoring of S3 VIs to become reliable proxies. In this scenario, this article proposes a novel temporal reconstruction convolutional neural network (CNN), named dual attention temporal CNN (DAT-CNN), which has been specially designed for time-resolving S3 VIs using S2 and S3 multitemporal observations. In contrast to other existing techniques, DAT-CNN implements two different branches for processing and fusing S2 and S3 multimodal data, while further exploiting intersensor synergies. Besides, DAT-CNN also incorporates a new spatial–spectral and temporal attention module to suppress uninformative spatial–spectral features, while focusing on the most relevant temporal stamps for each particular prediction. The experimental comparison, including several temporal reconstruction methods and multiple operational Sentinel data products, demonstrates the competitive advantages of the proposed model with respect to the state of the art. The codes of this article will be available at https://github.com/ibanezfd/DATCNN .

J. Xia, N. Yokoya, and G. Baier, ”DML: Differ-modality learning for building semantic segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-14, 2022.
PDF Quick Abstract
Abstract: This work critically analyzes the problems arising from differ-modality building semantic segmentation in the remote sensing domain. With the growth of multi-modality datasets, such as optical, synthetic aperture radar (SAR), light detection and ranging (LiDAR), and the scarcity of semantic knowledge, the task of learning multi-modality information has increasingly become relevant over the last few years. However, multi-modality datasets cannot be obtained simultaneously due to many factors. Assume we have SAR images with reference information in one place and optical images without reference in another; how to learn relevant features of optical images from SAR images? We refer to it as differ-modality learning (DML). To solve the DML problem, we propose novel deep neural network architectures, which include image adaptation, feature adaptation, knowledge distillation, and self-training modules for different scenarios. We test the proposed methods on the differ-modality remote sensing datasets (very high-resolution SAR and RGB from SpaceNet 6) to build semantic segmentation and achieve superior efficiency. The presented approach achieves the best performance when compared to the state-of-the-art methods.

Z. Li, F. Lu, H. Zhang, L. Tu, J. Li, X. Huang, C. Robinson, N. Malkin, N. Jojic, P. Ghamisi, R. Hänsch, and, N. Yokoya, ”The Outcome of the 2021 IEEE GRSS Data Fusion Contest - Track MSD: Multitemporal semantic change detection,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 2022.
PDF Quick Abstract
Abstract: We present here the scientific outcomes of the 2021 Data Fusion Contest (DFC2021) organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. DFC2021 was dedicated to research on geospatial artificial intelligence (AI) for social good with a global objective of modeling the state and changes of artificial and natural environments from multimodal and multitemporal remotely sensed data towards sustainable developments. DFC2021 included two challenge tracks: ``Detection of settlements without electricity'' and ``Multitemporal semantic change detection''. This paper mainly focuses on the outcome of the multitemporal semantic change detection track. We describe in this paper the DFC2021 dataset that remains available for further evaluation of corresponding approaches and report the results of the best-performing methods during the contest.

Y. Ma, Y. Li, K. Feng, Y. Xia, Q. Huang, H. Zhang, C. Prieur, G. Licciardi, H. Malha, J. Chanussot, P. Ghamisi, R. Hänsch, and, N. Yokoya, ”The Outcome of the 2021 IEEE GRSS Data Fusion Contest - Track DSE: Detection of settlements without electricity,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 12375-12385, 2021.
PDF Quick Abstract
Abstract: In this article, we elaborate on the scientific outcomes of the 2021 Data Fusion Contest (DFC2021), which was organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society, on the subject of geospatial artificial intelligence for social good. The ultimate objective of the contest was to model the state and changes of artificial and natural environments from multimodal and multitemporal remotely sensed data towards sustainable developments. DFC2021 consisted of two challenge tracks: Detection of settlements without electricity (DSE) and multitemporal semantic change detection. We focus here on the outcome of the DSE track. This article presents the corresponding approaches and reports the results of the best-performing methods during the contest.

X. Sun, P. Wang, Z. Yan, W. Diao, X. Lu, Z. Yang, Y. Zhang, D. Xiang, C. Yan, J. Guo, B. Dang, W. Wei, F. Xu, C. Wang, R. Hansch, M. Weinmann, N. Yokoya, and K. Fu, ”Automated high-resolution earth observation image interpretation: Outcome of the 2020 Gaofen Challenge,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 2021.
PDF Quick Abstract
Abstract: In this paper, we introduce the 2020 Gaofen Challenge and relevant scientific outcomes. The 2020 Gaofen Challenge is an international competition, which is organized by the China High-Resolution Earth Observation Conference Committee and the Aerospace Information Research Institute, Chinese Academy of Sciences and technically co-sponsored by the IEEE Geoscience and Remote Sensing Society (IEEE-GRSS) and the International Society for Photogrammetry and Remote Sensing (ISPRS). It aims at promoting the academic development of automated high-resolution earth observation image interpretation. Six independent tracks have been organized in this challenge, which cover the challenging problems in the field of object detection and semantic segmentation. With the development of convolutional neural networks, deep learning-based methods have achieved good performance on image interpretation. In this paper, we report the details and the best-performing methods presented so far in the scope of this challenge.

W. He, Y. Chen, N. Yokoya, C. Li, and Q. Zhao, ”Hyperspectral super-resolution via coupled tensor ring factorization,” Pattern Recognition, 2021.
PDF Quick Abstract
Abstract: Hyperspectral super-resolution (HSR) fuses a low-resolution hyperspectral image (HSI) and a high-resolution multispectral image (MSI) to obtain a high-resolution HSI (HR-HSI). In this paper, we propose a new model, named coupled tensor ring factorization (CTRF), for HSR. The proposed CTRF approach simultaneously learns high spectral resolution core tensor from the HSI and high spatial resolution core tensors from the MSI, and reconstructs the HR-HSI via tensor ring (TR) representation (Figure 1). The CTRF model can separately exploit the low-rank property of each class (Section III-C), which has been never explored in the previous coupled tensor model. Meanwhile, it inherits the simple representation of coupled matrix/CP factorization and flexible low-rank exploration of coupled Tucker factorization. Guided by Theorem 2, we further propose a spectral nuclear norm regularization to explore the global spectral low-rank property. The experiments have demonstrated the advantage of the proposed nuclear norm regularized CTRF (NCTRF) as compared to previous matrix/tensor and deep learning methods.

W. He, N. Yokoya, X. Yuan, ”Fast hyperspectral image recovery via non-iterative fusion of dual-camera compressive hyperspectral imaging,” IEEE Transactions on Image Processing, (accepted for publication), 2021.
PDF Quick Abstract
Abstract: Coded aperture snapshot spectral imaging (CASSI) is a promising technique to capture the three-dimensional hyperspectral image (HSI) using a single coded two-dimensional (2D) measurement, in which algorithms are used to perform the inverse problem. Due to the ill-posed nature, various regularizers have been exploited to reconstruct the 3D data from the 2D measurement. Unfortunately, the accuracy and computational complexity are unsatisfied. One feasible solution is to utilize additional information such as the RGB measurement in CASSI. Considering the combined CASSI and RGB measurement, in this paper, we propose a new fusion model for the HSI reconstruction. We investigate the spectral low-rank property of HSI composed of a spectral basis and spatial coefficients. Specifically, the RGB measurement is utilized to estimate the coefficients, meanwhile the CASSI measurement is adopted to provide the orthogonal spectral basis. We further propose a patch processing strategy to enhance the spectral low-rank property of HSI. The proposed model neither requires non-local processing or iteration, nor the spectral sensing matrix of the RGB detector. Extensive experiments on both simulated and real HSI dataset demonstrate that our proposed method outperforms previous state-of-the-art not only in quality but also speeds up the reconstruction more than 5000 times.

N. Le, T. D. Pham, N. Yokoya, and H. N. Thang, ”Learning from multimodal and multisensory earth observation dataset for improving estimates of mangrove soil organic carbon in Vietnam,” International Journal of Remote Sensing, (in press), 2021.

J. Xia, N. Yokoya, B. Adriano, L. Zhang, G. Li, Z. Wang, ”A benchmark high-resolution GaoFen-3 SAR dataset for building semantic segmentation,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., 2021.
PDF Quick Abstract
Abstract: Deep learning is increasingly popular in remote sensing communities and already successful in land cover classification and semantic segmentation. However, most studies are limited to the utilization of optical datasets. Despite few attempts applied to synthetic aperture radar (SAR) using deep learning, the huge potential, especially for the very high-resolution SAR, are still underexploited. Taking building segmentation as an example, the very high resolution (VHR) SAR datasets are still missing to the best of our knowledge. A comparable baseline for SAR building segmentation does not exist, and which segmentation method is more suitable for SAR image is poorly understood. This paper first provides a benchmark high-resolution (1 m) GaoFen-3 SAR datasets, which cover nine cities from seven countries, review the state-of-the-art semantic segmentation methods applied to SAR, and then summarize the potential operations to improve the performance. With these comprehensive assessments, we hope to provide the recommendation and roadmap for future SAR semantic segmentation.

D. Hong, L. Gao, J. Yao, N. Yokoya, J. Chanussot, U. Heiden, B. Zhang, ”Endmember-guided unmixing network (EGU-Net): A general deep learning framework for self-supervised hyperspectral unmixing,” IEEE Transactions on Neural Networks and Learning Systems, (in press), 2021.
PDF Code Quick Abstract
Abstract: Over the past decades, enormous efforts have been made to improve the performance of linear or nonlinear mixing models for hyperspectral unmixing, yet their ability to simultaneously generalize various spectral variabilities and extract physically meaningful endmembers still remains limited due to the poor ability in data fitting and reconstruction and the sensitivity to various spectral variabilities. Inspired by the powerful learning ability of deep learning, we attempt to develop a general deep learning approach for hyperspectral unmixing, by fully considering the properties of endmembers extracted from the hyperspectral imagery, called endmember-guided unmixing network (EGU-Net). Beyond the alone autoencoder-like architecture, EGU-Net is a two-stream Siamese deep network, which learns an additional network from the pure or nearly-pure endmembers to correct the weights of another unmixing network by sharing network parameters and adding spectrally meaningful constraints (e.g., non-negativity and sum-to-one) towards a more accurate and interpretable unmixing solution. Furthermore, the resulting general framework is not only limited to pixel-wise spectral unmixing but also applicable to spatial information modeling with convolutional operators for spatial-spectral unmixing. Experimental results conducted on three different datasets with the ground-truth of abundance maps corresponding to each material demonstrate the effectiveness and superiority of the EGU-Net over state-of-the-art unmixing algorithms. The codes will be available from the website: https://github.com/danfenghong/IEEE_TNNLS_EGU-Net.

Y. Qu, H. Qi, C. Kwan, N. Yokoya, J. Chanussot, ”Unsupervised and unregistered hyperspectral image super-resolution with mutual dirichlet-net,” IEEE Transactions on Geoscience and Remote Sensing, (in press), 2021.
PDF Quick Abstract
Abstract: Hyperspectral images (HSI) provide rich spectral information that has contributed to the successful performance improvement of numerous computer vision and remote sensing tasks. However, it can only be achieved at the expense of images’ spatial resolution. Hyperspectral image super-resolution (HSI- SR) thus addresses this problem by fusing low resolution (LR) HSI with multispectral image (MSI) carrying much higher spatial resolution (HR). Existing HSI-SR approaches require the LR HSI and HR MSI to be well registered and the reconstruction accuracy of the HR HSI relies heavily on the registration accuracy of different modalities. In this paper, we propose an unregistered and unsupervised mutual Dirichlet-Net (u2-MDN) to exploit the uncharted problem domain of HSI-SR without the requirement of multi-modality registration. The success of this endeavor would largely facilitate the deployment of HSI-SR since registration requirement is difficult to satisfy in real-world sensing devices. The novelty of this work is three-fold. First, to stabilize the fusion procedure of two unregistered modalities, the network is designed to extract spatial and spectral information of two modalities with different dimensions through a shared encoder-decoder structure. Second, the mutual information (MI) is further adopted to capture the non-linear statistical dependen- cies between the representations from two modalities (carrying spatial information) and their raw inputs. By maximizing the MI, spatial correlations between different modalities can be well characterized to further reduce the spectral distortion. We assume the representations follow a similar Dirichlet distribution for its inherent sum-to-one and non-negative properties. Third, a collaborative l2,1 norm is employed as the reconstruction error instead of the more common l2 norm to better preserve the spectral information. Extensive experimental results demonstrate the superior performance of u2-MDN as compared to the state- of-the-art.

G. Baier, A. Deschemps, M. Schmitt, and N. Yokoya, ”Synthesizing optical and SAR imagery from land cover maps and auxiliary raster data,” IEEE Transactions on Geoscience and Remote Sensing, (in press), 2021.
PDF Code Quick Abstract
Abstract: We synthesize both optical RGB and SAR remote sensing images from land cover maps and auxiliary raster data using GANs. In remote sensing many types of data, such as digital elevation models or precipitation maps, are often not reflected in land cover maps but still influence image content or structure. Including such data in the synthesis process increases the quality of the generated images and exerts more control on their characteristics. Spatially adaptive normalization layers fuse both inputs and are applied to a full-blown generator architecture consisting of encoder and decoder, to take full advantage of the information content in the auxiliary raster data. Our method successfully synthesizes medium (10m) and high (1m) resolution images, when trained with the corresponding dataset. We show the advantage of data fusion of land cover maps and auxiliary information using mean intersection over union, pixel accuracy and Fréchet inception distance using pre-trained U-Net segmentation models. Handpicked images exemplify how fusing information avoids ambiguities in the synthesized images. By slightly editing the input our method can be used to synthesize realistic changes, i.e., raising the water levels. The source code is available at this https URL and we published the newly created high-resolution dataset at this https URL.

C. Robinson, K. Malkin, N. Jojic, H. Chen, R. Qin, C. Xiao, M. Schmitt, P. Ghamisi, R. Haensch, and N. Yokoya, ”Global land cover mapping with weak supervision: Outcome of the 2020 IEEE GRSS Data Fusion Contest,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. (in press), 2021.
PDF Quick Abstract
Abstract: This paper presents the scientific outcomes of the 2020 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. The 2020 Contest addressed the problem of automatic global land-cover mapping with weak supervision, i.e. estimating high-resolution semantic maps while only low-resolution reference data is available during training. Two separate competitions were organized to assess two different scenarios: 1) high-resolution labels are not available at all and 2) a small amount of high-resolution labels are available additionally to low-resolution reference data. In this paper we describe the DFC2020 dataset that remains available for further evaluation of corresponding approaches and report the results of the best-performing methods during the contest.

B. Adriano, N. Yokoya, J. Xia, H. Miura, W. Liu, M. Matsuoka, S. Koshimura, ”Learning from multimodal and multitemporal earth observation data for building damage mapping,” ISPRS Journal of Photogrammetry and Remote Sensing (in press), 2021.
PDF Quick Abstract
Abstract: Earth observation technologies, such as optical imaging and synthetic aperture radar (SAR), provide excellent means to monitor ever-growing urban environments continuously. Notably, in the case of large-scale disasters (e.g., tsunamis and earthquakes), in which a response is highly time-critical, images from both data modalities can complement each other to accurately convey the full damage condition in the disaster's aftermath. However, due to several factors, such as weather and satellite coverage, it is often uncertain which data modality will be the first available for rapid disaster response efforts. Hence, novel methodologies that can utilize all accessible EO datasets are essential for disaster management. In this study, we have developed a global multisensor and multitemporal dataset for building damage mapping. We included building damage characteristics from three disaster types, namely, earthquakes, tsunamis, and typhoons, and considered three building damage categories. The global dataset contains high-resolution optical imagery and high-to-moderate-resolution multiband SAR data acquired before and after each disaster. Using this comprehensive dataset, we analyzed five data modality scenarios for damage mapping: single-mode (optical and SAR datasets), cross-modal (pre-disaster optical and post-disaster SAR datasets), and mode fusion scenarios. We defined a damage mapping framework for the semantic segmentation of damaged buildings based on a deep convolutional neural network algorithm. We compare our approach to another state-of-the-art baseline model for damage mapping. The results indicated that our dataset, together with a deep learning network, enabled acceptable predictions for all the data modality scenarios.

D. Hong, W. He, N. Yokoya, J. Yao, L. Gao, L. Zhang, J. Chanussot, and X.X. Zhu, ”Interpretable hyperspectral AI: When non-convex modeling meets hyperspectral remote sensing,” IEEE Geoscience and Remote Sensing Magazine (in press), 2021.
PDF Quick Abstract
Abstract: Hyperspectral imaging, also known as image spectrometry, is a landmark technique in geoscience and remote sensing (RS). In the past decade, enormous efforts have been made to process and analyze these hyperspectral (HS) products mainly by means of seasoned experts. However, with the ever-growing volume of data, the bulk of costs in manpower and material resources poses new challenges on reducing the burden of manual labor and improving efficiency. For this reason, it is, therefore, urgent to develop more intelligent and automatic approaches for various HS RS applications. Machine learning (ML) tools with convex optimization have successfully undertaken the tasks of numerous artificial intelligence (AI)-related applications. However, their ability in handling complex practical problems remains limited, particularly for HS data, due to the effects of various spectral variabilities in the process of HS imaging and the complexity and redundancy of higher dimensional HS signals. Compared to the convex models, non-convex modeling, which is capable of characterizing more complex real scenes and providing the model interpretability technically and theoretically, has been proven to be a feasible solution to reduce the gap between challenging HS vision tasks and currently advanced intelligent data processing models. This article mainly presents an advanced and cutting-edge technical survey for non-convex modeling towards interpretable AI models covering a board scope in the following topics of HS RS: 1) HS image restoration, 2) dimensionality reduction, 3) data fusion and enhancement, 3) spectral unmixing, 4) cross-modality learning for large-scale land cover mapping. Around these topics, we will showcase the significance of non-convex techniques to bridge the gap between HS RS and interpretable AI models with a brief introduction on the research background and motivation, an emphasis on the resulting methodological foundations and solution, and an intuitive clarification of illustrative examples. At the end of each topic, we also pose the remaining challenges on how to completely model the issues of complex spectral vision from the perspective of intelligent ML combined with physical priors and numerical non-convex modeling, and accordingly point out future research directions. This paper aims to create a good entry point to the advanced literature for experienced researchers, Ph.D. students, and engineers who already have some background knowledge in HS RS, ML, and optimization. This can further help them launch new investigations on the basis of the above topics and interpretable AI techniques for their focused fields.

T. D. Pham, N. Yokoya, T. T. T. Nguyen, N. N. Le, N. T. Ha, J. Xia, W. Takeuchi, T. D. Pham, ”Improvement of mangrove soil carbon stocks estimation in North Vietnam using Sentinel-2 data and machine learning approach,” GIScience & Remote Sensing, 2020.
PDF Quick Abstract
Abstract: Quantifying total carbon (TC) stocks in soil across various mangrove ecosystems is key to understanding the global carbon cycle to reduce greenhouse gas emissions. Estimating mangrove TC at a large scale remains challenging due to the difficulty and high cost of soil carbon measurements when the number of samples is high. In the present study, we investigated the capability of Sentinel-2 multispectral data together with a state-of-the-art machine learning (ML) technique, which is a combination of CatBoost regression (CBR) and a genetic algorithm (GA) for feature selection and optimization (the CBR-GA model) to estimate the mangrove soil C stocks across the mangrove ecosystems in North Vietnam. We used the field survey data collected from 177 soil cores. We compared the performance of the proposed model with those of the four ML algorithms, i.e., the extreme gradient boosting regression (XGBR), the light gradient boosting machine regression (LGBMR), the support vector regression (SVR), and the random forest regression (RFR) models. Our proposed model estimated the TC level in the soil as 35.06–166.83 Mg ha−1 (average = 92.27 Mg ha−1) with satisfactory accuracy (R 2 = 0.665, RMSE = 18.41 Mg ha−1) and yielded the best prediction performance among all the ML techniques. We conclude that the Sentinel-2 data combined with the CBR-GA model can improve estimates of the mangrove TC at 10 m spatial resolution in tropical areas. The effectiveness of the proposed approach should be further evaluated for different mangrove soils of the other mangrove ecosystems in tropical and semi-tropical regions.

M. Pourshamsi, J. Xia, N. Yokoya, M. Garcia, M. Lavalle, E. Pottier, and H. Balzter, ”Tropical forest canopy height estimation from combined polarimetric SAR and LiDAR using machine learning,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 172, pp. 79-94, 2021.
PDF Quick Abstract
Abstract: Forest height is an important forest biophysical parameter which is used to derive important information about forest ecosystems, such as forest above ground biomass. In this paper, the potential of combining Polarimetric Synthetic Aperture Radar (PolSAR) variables with LiDAR measurements for forest height estimation is investigated. This will be conducted using different machine learning algorithms including Random Forest (RFs), Rotation Forest (RoFs), Canonical Correlation Forest (CCFs) and Support Vector Machine (SVMs). Various PolSAR parameters are required as input variables to ensure a successful height retrieval across different forest heights ranges. The algorithms are trained with 5000 LiDAR samples (less than 1% of the full scene) and different polarimetric variables. To examine the dependency of the algorithm on input training samples, three different subsets are identified which each includes different features: subset 1 is quiet diverse and includes non-vegetated region, short/sparse vegetation (0–20 m), vegetation with mid-range height (20–40 m) to tall/dense ones (40–60 m); subset 2 covers mostly the dense vegetated area with height ranges 40–60 m; and subset 3 mostly covers the non-vegetated to short/sparse vegetation (0–20 m) .The trained algorithms were used to estimate the height for the areas outside the identified subset. The results were validated with independent samples of LiDAR-derived height showing high accuracy (with the average R2 = 0.70 and RMSE = 10 m between all the algorithms and different training samples). The results confirm that it is possible to estimate forest canopy height using PolSAR parameters together with a small coverage of LiDAR height as training data.

J. Xia, N. Yokoya, and T. D. Pham, ”Probabilistic mangrove species mapping with multiple-source remote-sensing datasets using label distribution learning in Xuan Thuy National Park, Vietnam,” Remote Sensing, vol. 12, no. 22, p. 3834, 2020.
PDF Quick Abstract
Abstract: Mangrove forests play an important role in maintaining water quality, mitigating climate change impacts, and providing a wide range of ecosystem services. Effective identification of mangrove species using remote-sensing images remains a challenge. The combinations of multi-source remote-sensing datasets (with different spectral/spatial resolution) are beneficial to the improvement of mangrove tree species discrimination. In this paper, various combinations of remote-sensing datasets including Sentinel-1 dual-polarimetric synthetic aperture radar (SAR), Sentinel-2 multispectral, and Gaofen-3 full-polarimetric SAR data were used to classify the mangrove communities in Xuan Thuy National Park, Vietnam. The mixture of mangrove communities consisting of small and shrub mangrove patches is generally difficult to separate using low/medium spatial resolution. To alleviate this problem, we propose to use label distribution learning (LDL) to provide the probabilistic mapping of tree species, including Sonneratia caseolaris (SC), Kandelia obovata (KO), Aegiceras corniculatum (AC), Rhizophora stylosa (RS), and Avicennia marina (AM). The experimental results show that the best classification performance was achieved by an integration of Sentinel-2 and Gaofen-3 datasets, demonstrating that full-polarimetric Gaofen-3 data is superior to the dual-polarimetric Sentinel-1 data for mapping mangrove tree species in the tropics.

N. Yokoya, K. Yamanoi, W. He, G. Baier, B. Adriano, H. Miura, and S. Oishi, ”Breaking limits of remote sensing by deep learning from simulated data for flood and debris flow mapping,” IEEE Transactions on Geoscience and Remote Sensing, (early access), 2020.
PDF Code Quick Abstract
Abstract: We propose a framework that estimates inundation depth (maximum water level) and debris-flow-induced topographic deformation from remote sensing imagery by integrating deep learning and numerical simulation. A water and debris flow simulator generates training data for various artificial disaster scenarios. We show that regression models based on Attention U-Net and LinkNet architectures trained on such synthetic data can predict the maximum water level and topographic deformation from a remote sensing-derived change detection map and a digital elevation model. The proposed framework has an inpainting capability, thus mitigating the false negatives that are inevitable in remote sensing image analysis. Our framework breaks limits of remote sensing and enables rapid estimation of inundation depth and topographic deformation, essential information for emergency response, including rescue and relief activities. We conduct experiments with both synthetic and real data for two disaster events that caused simultaneous flooding and debris flows and demonstrate the effectiveness of our approach quantitatively and qualitatively. Our code and datasets are available at https://github.com/nyokoya/dlsim.

Y. Lian, T. Feng, J. Zhou, M. Jia, A. Li, Z. Wu, L. Jiao, M. Brown, G. Hager, N. Yokoya, R. Haensch, and B. Le Saux, ”Large-Scale Semantic 3D Reconstruction: Outcome of the 2019 IEEE GRSS Data Fusion Contest - Part B,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., (early access), 2020.
PDF Quick Abstract
Abstract: We present the scientific outcomes of the 2019 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. The contest included challenges with large-scale data sets for semantic 3D reconstruction from satellite images and also semantic 3D point cloud classification from airborne LiDAR. 3D reconstruction results are discussed separately in Part-A. In this Part-B, we report the results of the two}best-performing approaches for 3D point cloud classification. Both are deep learning methods that improve upon the PointSIFT model with mechanisms to combine multi-scale features and task-specific post-processing to refine model outputs.

S. Kunwar, H. Chen, M. Lin, H. Zhang, P. D'Angelo, D. Cerra, S. M. Azimi, M. Brown, G. Hager, N. Yokoya, R. Haensch, and B. Le Saux, ”Large-Scale Semantic 3D Reconstruction: Outcome of the 2019 IEEE GRSS Data Fusion Contest - Part A,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., (early access), 2020.
PDF Quick Abstract
Abstract: In this paper, we present the scientific outcomes of the 2019 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. The 2019 Contest addressed the problem of 3D reconstruction and 3D semantic understanding on a large scale. Several competitions were organized to assess specific issues, such as elevation estimation and semantic mapping from a single view, two views, or multiple views. In this Part A, we report the results of the best-performing approaches for semantic 3D reconstruction according to these various set-ups, while 3D point cloud semantic mapping is discussed in Part B.

W. He, Q. Yao, C. Li, N. Yokoya, Q. Zhao, H. Zhang, L. Zhang, ”Non-local meets global: An iterative paradigm for hyperspectral image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, (early access), 2020.
PDF Quick Abstract
Abstract: Non-local low-rank tensor approximation has been developed as a state-of-the-art method for hyperspectral image (HSI) restoration, which includes the tasks of denoising, compressed HSI reconstruction and inpainting. Unfortunately, while its restoration performance benefits from more spectral bands, its runtime also substantially increases. In this paper, we claim that the HSI lies in a global spectral low-rank subspace, and the spectral subspaces of each full band patch group should lie in this global low-rank subspace. This motivates us to propose a unified paradigm combining the spatial and spectral properties for HSI restoration. The proposed paradigm enjoys performance superiority from the non-local spatial denoising and light computation complexity from the low-rank orthogonal basis exploration. An efficient alternating minimization algorithm with rank adaptation is developed. It is done by first solving a fidelity term-related problem for the update of a latent input image, and then learning a low-dimensional orthogonal basis and the related reduced image from the latent input image. Subsequently, non-local low-rank denoising is developed to refine the reduced image and orthogonal basis iteratively. Finally, the experiments on HSI denoising, compressed reconstruction, and inpainting tasks, with both simulated and real datasets, demonstrate its superiority with respect to state-of-the-art HSI restoration methods.

D. Hong, J. Kang, N. Yokoya, and J. Chanussot, ”Graph-induced aligned learning on subspaces for hyperspectral and multispectral data,” IEEE Transactions on Geoscience and Remote Sensing, (early access), 2020.
PDF Quick Abstract
Abstract: In this article, we have great interest in investigating a common but practical issue in remote sensing (RS)--can a limited amount of one information-rich (or high-quality) data, e.g., hyperspectral (HS) image, improve the performance of a classification task using a large amount of another information-poor (low-quality) data, e.g., multispectral (MS) image? This question leads to a typical cross-modality feature learning. However, classic cross-modality representation learning approaches, e.g., manifold alignment, remain limited in effectively and efficiently handling such problems that the data from high-quality modality are largely absent. For this reason, we propose a novel graph-induced aligned learning (GiAL) framework by 1) adaptively learning a unified graph (further yielding a Laplacian matrix) from the data in order to align multimodality data (MS-HS data) into a latent shared subspace; 2) simultaneously modeling two regression behaviors with respect to labels and pseudo-labels under a multitask learning paradigm; and 3) dramatically updating the pseudo-labels according to the learned graph and refeeding the latest pseudo-labels into model learning of the next round. In addition, an optimization framework based on the alternating direction method of multipliers (ADMMs) is devised to solve the proposed GiAL model. Extensive experiments are conducted on two MS-HS RS data sets, demonstrating the superiority of the proposed GiAL compared with several state-of-the-art methods..

D. Hong, N. Yokoya, J. Chanussot, J. Xu, and X. X. Zhu, ”Joint and progressive subspace analysis (JPSA) with spatial-spectral manifold alignment for semi-supervised hyperspectral dimensionality reduction,” IEEE Transactions on Cybernetics, (accepted for publication), 2020.
PDF Quick Abstract
Abstract: Conventional nonlinear subspace learning techniques (e.g., manifold learning) usually introduce some drawbacks in explainability (explicit mapping) and cost-effectiveness (linearization), generalization capability (out-of-sample), and representability (spatial-spectral discrimination). To overcome these shortcomings, a novel linearized subspace analysis technique with spatial-spectral manifold alignment is developed for a semi-supervised hyperspectral dimensionality reduction (HDR), called joint and progressive subspace analysis (JPSA). The JPSA learns a high-level, semantically meaningful, joint spatial-spectral feature representation from hyperspectral data by 1) jointly learning latent subspaces and a linear classifier to find an effective projection direction favorable for classification; 2) progressively searching several intermediate states of subspaces to approach an optimal mapping from the original space to a potential more discriminative subspace; 3) spatially and spectrally aligning manifold structure in each learned latent subspace in order to preserve the same or similar topological property between the compressed data and the original data. A simple but effective classifier, i.e., nearest neighbor (NN), is explored as a potential application for validating the algorithm performance of different HDR approaches. Extensive experiments are conducted to demonstrate the superiority and effectiveness of the proposed JPSA on two widely-used hyperspectral datasets: Indian Pines (92.98\%) and the University of Houston (86.09\%) in comparison with previous state-of-the-art HDR methods. The demo of this basic work (i.e., ECCV2018) is openly available at https://github.com/danfenghong/ECCV2018_J-Play.

D. Hong, L. Gao, N. Yokoya, J. Yao, J. Chanussot, Q. Du, and B. Zhang, ”More diverse means better: Multimodal deep learning meets remote-sensing imagery classification,” IEEE Transactions on Geoscience and Remote Sensing, (accepted for publication), 2020.
PDF Quick Abstract
Abstract: Classification and identification of the materials lying over or beneath the Earth's surface have long been a fundamental but challenging research topic in geoscience and remote sensing (RS) and have garnered a growing concern owing to the recent advancements of deep learning techniques. Although deep networks have been successfully applied in single-modality-dominated classification tasks, yet their performance inevitably meets the bottleneck in complex scenes that need to be finely classified, due to the limitation of information diversity. In this work, we provide a baseline solution to the aforementioned difficulty by developing a general multimodal deep learning (MDL) framework. In particular, we also investigate a special case of multi-modality learning (MML) -- cross-modality learning (CML) that exists widely in RS image classification applications. By focusing on ``what'', ``where'', and ``how'' to fuse, we show different fusion strategies as well as how to train deep networks and build the network architecture. Specifically, five fusion architectures are introduced and developed, further being unified in our MDL framework. More significantly, our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs). To validate the effectiveness and superiority of the MDL framework, extensive experiments related to the settings of MML and CML are conducted on two different multimodal RS datasets. Furthermore, the codes and datasets will be available at: https://github.com/danfenghong/IEEE_TGRS_MDL-RS, contributing to the RS community.

D. Hong, N. Yokoya, G.-S. Xia, J. Chanussot, and X. X. Zhu, ”X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data,” ISPRS Journal of Photogrammetry and Remote Sensing, (accepted for publication), 2020.
PDF Quick Abstract
Abstract: This paper addresses the problem of semi-supervised transfer learning with limited cross-modality data in remote sensing. A large amount of multi-modal earth observation images, such as multispectral imagery (MSI) or synthetic aperture radar (SAR) data, are openly available on a global scale, enabling parsing global urban scenes through remote sensing imagery. However, their ability in identifying materials (pixel-wise classification) remains limited, due to the noisy collection environment and poor discriminative information as well as a limited number of well-annotated training images. To this end, we propose a novel cross-modal deep-learning framework, called X-ModalNet, with three well-designed modules: self-adversarial module, interactive learning module, and label propagation module, by learning to transfer more discriminative information from a small-scale hyperspectral image (HSI) into the classification task using a large-scale MSI or SAR data. Significantly, X-ModalNet generalizes well, owing to propagating labels on an updatable graph constructed by high-level features on the top of the network, yielding semi-supervised cross-modality learning. We evaluate X-ModalNet on two multi-modal remote sensing datasets (HSI-MSI and HSI-SAR) and achieve a significant improvement in comparison with several state-of-the-art methods.

E. Mas, R. Paulik, K. Pakoksung, B. Adriano, L. Moya, A. Suppasri, A. Muhari, R. Khomarudin, N. Yokoya, M. Matsuoka, and S. Koshimura, ”Characteristics of tsunami fragility functions developed using different sources of damage data from the 2018 Sulawesi earthquake and tsunami,” Pure and Applied Geophysics, 2020.
Quick Abstract
Abstract: We developed tsunami fragility functions using three sources of damage data from the 2018 Sulawesi tsunami at Palu Bay in Indonesia obtained from (i) field survey data (FS), (ii) a visual interpretation of optical satellite images (VI), and (iii) a machine learning and remote sensing approach utilized on multisensor and multitemporal satellite images (MLRS). Tsunami fragility functions are cumulative distribution functions that express the probability of a structure reaching or exceeding a particular damage state in response to a specific tsunami intensity measure, in this case obtained from the interpolation of multiple surveyed points of tsunami flow depth. We observed that the FS approach led to a more consistent function than that of the VI and MLRS methods. In particular, an initial damage probability observed at zero inundation depth in the latter two methods revealed the effects of misclassifications on tsunami fragility functions derived from VI data; however, it also highlighted the remarkable advantages of MLRS methods. The reasons and insights used to overcome such limitations are discussed together with the pros and cons of each method. The results show that the tsunami damage observed in the 2018 Sulawesi event in Indonesia, expressed in the fragility function developed herein, is similar in shape to the function developed after the 1993 Hokkaido Nansei-oki tsunami, albeit with a slightly lower damage probability between zero-to-five-meter inundation depths. On the other hand, in comparison with the fragility function developed after the 2004 Indian Ocean tsunami in Banda Aceh, the characteristics of Palu structures exhibit higher fragility in response to tsunamis. The two-meter inundation depth exhibited nearly 20% probability of damage in the case of Banda Aceh, while the probability of damage was close to 70% at the same depth in Palu.

M. E. Paoletti, J. M. Haut, P. Ghamisi, N. Yokoya, J. Plaza, and A. Plaza, ”U-IMG2DSM: Unpaired simulation of digital surface models with generative adversarial networks,” IEEE Geoscience and Remote Sensing Letters, (Early Access), pp. 1-5, 2020.
PDF Code Quick Abstract
Abstract: High-resolution digital surface models (DSMs) provide valuable height information about the Earth's surface, which can be successfully combined with other types of remotely sensed data in a wide range of applications. However, the acquisition of DSMs with high spatial resolution is extremely time-consuming and expensive with their estimation from a single optical image being an ill-possed problem. To overcome these limitations, this letter presents a new unpaired approach to obtain DSMs from optical images using deep learning techniques. Specifically, our new deep neural model is based on variational autoencoders (VAEs) and generative adversarial networks (GANs) to perform image-to-image translation, obtaining DSMs from optical images. Our newly proposed method has been tested in terms of photographic interpretation, reconstruction error, and classification accuracy using three well-known remotely sensed data sets with very high spatial resolution (obtained over Potsdam, Vaihingen, and Stockholm). Our experimental results demonstrate that the proposed approach obtains satisfactory reconstruction rates that allow enhancing the classification results for these images. The source code of our method is available from: https://github.com/mhaut/UIMG2DSM.

Y. Chen, T.-Z. Huang, W. He, N. Yokoya, and X.-L. Zhao, ”Hyperspectral image compressive sensing reconstruction using subspace-based nonlocal tensor ring decomposition,” IEEE Transactions on Image Processing, (Early Access), pp. 1-16, 2020.
PDF Quick Abstract
Abstract: Hyperspectral image compressive sensing reconstruction (HSI-CSR) can largely reduce the high expense and low efficiency of transmitting HSI to ground stations by storing a few compressive measurements, but how to precisely reconstruct the HSI from a few compressive measurements is a challenging issue. It has been proven that considering the global spectral correlation, spatial structure, and nonlocal selfsimilarity priors of HSI can achieve satisfactory reconstruction performances. However, most of the existing methods cannot simultaneously capture the mentioned priors and directly design the regularization term to the HSI. In this article, we propose a novel subspace-based nonlocal tensor ring decomposition method (SNLTR) for HSI-CSR. Instead of designing the regularization of the low-rank approximation to the HSI, we assume that the HSI lies in a low-dimensional subspace. Moreover, to explore the nonlocal self-similarity and preserve the spatial structure of HSI, we introduce a nonlocal tensor ring decomposition strategy to constrain the related coefficient image, which can decrease the computational cost compared to the methods that directly employ the nonlocal regularization to HSI. Finally, a well-known alternating minimization method is designed to efficiently solve the proposed SNLTR. Extensive experimental results demonstrate that our SNLTR method can significantly outperform existing approaches for HSI-CSR.

G. Baier, W. He, and N. Yokoya, ”Robust nonlocal low-rank SAR time series despeckling considering speckle correlation by total variation regularization,” IEEE Transactions on Geoscience and Remote Sensing, accepted for publication, 2020.
PDF Code Quick Abstract
Abstract: Outliers and speckle both corrupt time series of sar acquisitions. Owing to the coherence between sar acquisitions, their speckle can no longer be regarded as independent. In this study, we propose an algorithm for nonlocal low-rank time series despeckling, which is robust against outliers and also specifically addresses speckle correlation between acquisitions. By imposing total variation regularization on the signal's speckle component, the correlation between acquisitions can be identified, facilitating the extraction of outliers from unfiltered signals and the correlated speckle. This robustness against outliers also addresses matching errors and inaccuracies in the nonlocal similarity search. Such errors include mismatched data in the nonlocal estimation process, which degrade the denoising performance of conventional similarity-based filtering approaches. Multiple experiments on real and synthetic data assess the performance of the approach by comparing it with state-of-the-art methods. It provides filtering results of comparable quality but is not adversely affected by outliers. The source code is available at https://github.com/gbaier/nllrtv.

C. Yoo, J. Im, D. Cho, N. Yokoya, J. Xia, and B. Bechtel, ”Estimation of all-weather 1km MODIS land surface temperature for humid summer days,” Remote Sensing, vol. 12, p. 1398, 2020.
PDF Quick Abstract
Abstract: Land surface temperature (LST) is used as a critical indicator for various environmental issues because it links land surface fluxes with the surface atmosphere. Moderate-resolution imaging spectroradiometers (MODIS) 1 km LSTs have been widely utilized but have the serious limitation of not being provided under cloudy weather conditions. In this study, we propose two schemes to estimate all-weather 1 km Aqua MODIS daytime (1:30 p.m.) and nighttime (1:30 a.m.) LSTs in South Korea for humid summer days. Scheme 1(S1) is a two-step approach that first estimates 10 km LSTs and then conducts the spatial downscaling of LSTs from 10 km to 1 km. Scheme 2(S2), a one-step algorithm, directly estimates the 1 km all-weather LSTs. Eight advanced microwave scanning radiometer 2 (AMSR2) brightness temperatures, three MODIS-based annual cycle parameters, and six auxiliary variables were used for the LST estimation based on random forest machine learning. To confirm the effectiveness of each scheme, we have performed different validation experiments using clear-sky MODIS LSTs. Moreover, we have validated all-weather LSTs using bias-corrected LSTs from 10 in situ stations. In clear-sky daytime, the performance of S2 was better than S1. However, in cloudy sky daytime, S1 simulated low LSTs better than S2, with an average root mean squared error (RMSE) of 2.6 °C compared to an average RMSE of 3.8 °C over 10 stations. At nighttime, S1 and S2 demonstrated no significant difference in performance both under clear and cloudy sky conditions. When the two schemes were combined, the proposed all-weather LSTs resulted in an average R2 of 0.82 and 0.74 and with RMSE of 2.5 °C and 1.4 °C for daytime and nighttime, respectively, compared to the in situ data. This paper demonstrates the ability of the two different schemes to produce all-weather dynamic LSTs. The strategy proposed in this study can improve the applicability of LSTs in a variety of research and practical fields, particularly for areas that are very frequently covered with clouds.

T.D. Pham, N. Yokoya, J. Xia, N.T. Ha, N.N. Le, T.T.T. Nguyen, T.H. Dao, T.T.P. Vu, T.D. Pham, and W. Takeuchi, ”Comparison of machine learning methods for estimating mangrove above-ground biomass using multiple source remote sensing data in the red river delta biosphere reserve, Vietnam,” Remote Sensing, vol. 12, p. 1334, 2020.
PDF Quick Abstract
Abstract: This study proposes a hybrid intelligence approach based on an extreme gradient boosting regression and genetic algorithm, namely, the XGBR-GA model, incorporating Sentinel-2, Sentinel-1, and ALOS-2 PALSAR-2 data to estimate the mangrove above-ground biomass (AGB), including small and shrub mangrove patches in the Red River Delta biosphere reserve across the northern coast of Vietnam. We used the novel extreme gradient boosting decision tree (XGBR) technique together with genetic algorithm (GA) optimization for feature selection to construct and verify a mangrove AGB model using data from a field survey of 105 sampling plots conducted in November and December of 2018 and incorporated the dual polarimetric (HH and HV) data of the ALOS-2 PALSAR-2 L-band and the Sentinel-2 multispectral data combined with Sentinel-1 (C-band VV and VH) data. We employed the root-mean-square error (RMSE) and coefficient of determination (R2) to evaluate the performance of the proposed model. The capability of the XGBR-GA model was assessed via a comparison with other machine-learning (ML) techniques, i.e., the CatBoost regression (CBR), gradient boosted regression tree (GBRT), support vector regression (SVR), and random forest regression (RFR) models. The XGBR-GA model yielded a promising result (R2 = 0.683, RMSE = 25.08 Mg·ha−1) and outperformed the four other ML models. The XGBR-GA model retrieved a mangrove AGB ranging from 17 Mg·ha−1 to 142 Mg·ha−1 (with an average of 72.47 Mg·ha−1). Therefore, multisource optical and synthetic aperture radar (SAR) combined with the XGBR-GA model can be used to estimate the mangrove AGB in North Vietnam. The effectiveness of the proposed method needs to be further tested and compared to other mangrove ecosystems in the tropics.

J. Kang, D. Hong, J. Liu, G. Baier, N. Yokoya, and B. Demir, ”Learning convolutional sparse coding on complex domain for interferometric phase restoration,” IEEE Transactions on Neural Networks and Learning Systems, accepted for publication, 2020.
PDF Code Quick Abstract
Abstract:

L. Moya , A. Muhari, B. Adriano, S. Koshimura, E. Mas, L. R. M. Perezd, and N. Yokoya, ”Detecting urban changes using phase correlation and l1-based sparse model for early disaster response: A case study of the 2018 Sulawesi Indonesia earthquake-tsunami,” Remote Sensing of Environment, accepted for publication, 2020.
PDF Quick Abstract
Abstract:

T. D. Pham, N. N. Le, N. T. Ha, L. V. Nguyen, J. Xia, N. Yokoya, T. T. To, H. X. Trinh, L. Q. Kieu, and W. Takeuchi, ”Estimating mangrove above-ground biomass using extreme gradient boosting decision trees algorithm with a fusion of Sentinel-2 and ALOS-2 PALSAR-2 data in Can Gio Biosphere Reserve, Vietnam,” Remote Sensing, vol. 12, no. 5, pp. 777, 2020.
PDF Quick Abstract
Abstract: This study investigates the effectiveness of gradient boosting decision trees techniques in estimating mangrove above-ground biomass (AGB) at the Can Gio biosphere reserve (Vietnam). For this purpose, we employed a novel gradient-boosting regression technique called the extreme gradient boosting regression (XGBR) algorithm implemented and verified a mangrove AGB model using data from a field survey of 121 sampling plots conducted during the dry season. The dataset fuses the data of the Sentinel-2 multispectral instrument (MSI) and the dual polarimetric (HH, HV) data of ALOS-2 PALSAR-2. The performance standards of the proposed model (root-mean-square error (RMSE) and coefficient of determination (R2)) were compared with those of other machine learning techniques, namely gradient boosting regression (GBR), support vector regression (SVR), Gaussian process regression (GPR), and random forests regression (RFR). The XGBR model obtained a promising result with R2 = 0.805, RMSE = 28.13 Mg ha−1, and the model yielded the highest predictive performance among the five machine learning models. In the XGBR model, the estimated mangrove AGB ranged from 11 to 293 Mg ha−1 (average = 106.93 Mg ha−1). This work demonstrates that XGBR with the combined Sentinel-2 and ALOS-2 PALSAR-2 data can accurately estimate the mangrove AGB in the Can Gio biosphere reserve. The general applicability of the XGBR model combined with multiple sourced optical and SAR data should be further tested and compared in a large-scale study of forest AGBs in different geographical and climatic ecosystems.

B. Adriano, N. Yokoya, H. Miura, M. Matsuoka, and S. Koshimura, ”A semiautomatic pixel-object method for detecting landslides using multitemporal ALOS-2 intensity images,” Remote Sensing, vol. 12, no. 3, pp. 561, 2020.
PDF Quick Abstract
Abstract: The rapid and accurate mapping of large-scale landslides and other mass movement disasters is crucial for prompt disaster response efforts and immediate recovery planning. As such, remote sensing information, especially from synthetic aperture radar (SAR) sensors, has significant advantages over cloud-covered optical imagery and conventional field survey campaigns. In this work, we introduced an integrated pixel-object image analysis framework for landslide recognition using SAR data. The robustness of our proposed methodology was demonstrated by mapping two different source-induced landslide events, namely, the debris flows following the torrential rainfall that fell over Hiroshima, Japan, in early July 2018 and the coseismic landslide that followed the 2018 Mw6.7 Hokkaido earthquake. For both events, only a pair of SAR images acquired before and after each disaster by the Advanced Land Observing Satellite-2 (ALOS-2) was used. Additional information, such as digital elevation model (DEM) and land cover information, was employed only to constrain the damage detected in the affected areas. We verified the accuracy of our method by comparing it with the available reference data. The detection results showed an acceptable correlation with the reference data in terms of the locations of damage. Numerical evaluations indicated that our methodology could detect landslides with an accuracy exceeding 80%. In addition, the kappa coefficients for the Hiroshima and Hokkaido events were 0.30 and 0.47, respectively.

T. Uezato, N. Yokoya, and W. He, ”Illumination invariant hyperspectral image unmixing based on a digital surface model,” IEEE Transactions on Image Processing, accepted for publication, 2019.
PDF Quick Abstract
Abstract: Although many spectral unmixing models have been developed to address spectral variability caused by variable incident illuminations, the mechanism of the spectral variability is still unclear. This paper proposes an unmixing model, named illumination invariant spectral unmixing (IISU). IISU makes the first attempt to use the radiance hyperspectral data and a LiDAR-derived digital surface model (DSM) in order to physically explain variable illuminations and shadows in the unmixing framework. Incident angles, sky factors, visibility from the sun derived from the LiDAR-derived DSM support the explicit explanation of endmember variability in the unmixing process from radiance perspective. The proposed model was efficiently solved by a straightforward optimization procedure. The unmixing results showed that the other state-of-the-art unmixing models did not work well especially in the shaded pixels. On the other hand, the proposed model estimated more accurate abundances and shadow compensated reflectance than the existing models.

D. Hong, X. Wu, P. Ghamisi, J. Chanussot, N. Yokoya, and X. X. Zhu, ”Invariant attribute profiles: A spatial-frequency joint feature extractor for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., accepted for publication, 2019.
PDF Quick Abstract
Abstract:

Y. Chen, L. Huang, L. Zhu, N. Yokoya, and X. Jia, ”Fine-grained classification of hyperspectral imagery based on deep learning,” Remote Sensing, accepted for publication, 2019.
PDF Quick Abstract
Abstract:

Y. Chen, W. He, N. Yokoya, and T.-Z. Huang, ”Non-local tensor ring decomposition for hyperspectral image denoising,” IEEE Trans. Geosci. Remote Sens., accepted for publication, 2019.
PDF Quick Abstract
Abstract: Hyperspectral image (HSI) denoising is a fundamental problem in remote sensing and image processing. Recently, non-local low-rank tensor approximation based denoising methods have attracted much attention, due to the advantage of fully exploiting the non-local self-similarity and global spectral correlation. Existing non-local low-rank tensor approximation methods were mainly based on two common Tucker or CP decomposition and achieved the state-of-the-art results, but they suffer some troubles and are not the best approximation for a tensor. For example, the number of parameters of Tucker decomposition increases exponentially follow its dimension, and CP decomposition cannot better preserve the intrinsic correlation of HSI. In this paper, we propose a non-local tensor ring (TR) approximation for HSI denoising by utilizing TR decomposition to simultaneously explore non-local self-similarity and global spectral low-rank characteristic. TR decomposition approximates a high-order tensor as a sequence of cyclically contracted three-order tensors, which has a strong ability to explore these two intrinsic priors and improve the HSI denoising result. Moreover, we develop an efficient proximal alternating minimization algorithm to efficiently optimize the proposed TR decomposition model. Extensive experiments on three simulated datasets under several noise levels and two real datasets testify that the proposed TR model performs better HSI denoising results than several state-of-the-art methods in term of quantitative and visual performance evaluations.

D. Hong, N. Yokoya, J. Chanussot, J. Xu, and X. X. Zhu, ”Learning to propagate labels on graphs: An iterative multitask regression framework for semi-supervised hyperspectral dimensionality reduction,” ISPRS Journal of Photogrammetry and Remote Sensing, accepted for publication, 2019.
PDF Quick Abstract
Abstract: Hyperspectral dimensionality reduction (HDR), an important preprocessing step prior to high-level data analysis, has been garnering growing attention in the remote sensing community. Although a variety of methods, both unsupervised and supervised models, have been proposed for this task, yet the discriminative ability in feature representation still remains limited due to the lack of a powerful tool that effectively exploits the labeled and unlabeled data in the HDR process. A semi-supervised HDR approach, called iterative multitask regression (IMR), is proposed in this paper to address this need. IMR aims at learning a low-dimensional subspace by jointly considering the labeled and unlabeled data, and also bridging the learned subspace with two regression tasks: labels and pseudo-labels initialized by a given classifier. More significantly, IMR dynamically propagates the labels on a learnable graph and progressively refines pseudo-labels, yielding a well-conditioned feedback system. Experiments conducted on three widely-used hyperspectral image datasets demonstrate that the dimension-reduced features learned by the proposed IMR framework with respect to classification or recognition accuracy are superior to those of related state-of-the-art HDR approaches.

D. Hong, J. Chanussot, N. Yokoya, J. Kang, and X. X. Zhu, ”Learning shared cross-modality representation using multispectral-LiDAR and hyperspectral data,” IEEE Geosci. Remote Sens. Lett., accepted for publication, 2019.
PDF Quick Abstract
Abstract: Due to the ever-growing diversity of the data source, multi-modality feature learning has attracted more and more attention. However, most of these methods are designed by jointly learning feature representation from multi-modalities that exist in both training and test sets, yet they are less investigated in absence of certain modality in the test phase. To this end, in this letter, we propose to learn a shared feature space across multi-modalities in the training process. By this way, the out-of-sample from any of multi-modalities can be directly projected onto the learned space for a more effective cross-modality representation. More significantly, the shared space is regarded as a latent subspace in our proposed method, which connects the original multi-modal samples with label information to further improve the feature discrimination. Experiments are conducted on the multispectral-Lidar and hyperspectral dataset provided by the 2018 IEEE GRSS Data Fusion Contest to demonstrate the effectiveness and superiority of the proposed method in comparison with several popular baselines.

Y. Chen, W. He, N. Yokoya, and T.-Z. Huang, ”Blind cloud and cloud shadow removal of multitemporal images based on total variation regularized low-rank sparsity decomposition,” ISPRS Journal of Photogrammetry and Remote Sensing, (accepted for publication), 2019.
PDF Quick Abstract
Abstract: Cloud and cloud shadow (cloud/shadow) removal from multitemporal satellite images is a challenging task and has elicited much attention for subsequent information extraction. Regarding cloud/shadow areas as missing information, low-rank matrix/tensor completion based methods are popular to recover information undergoing cloud/shadow degradation. However, existing methods required to determine the cloud/shadow locations in advance and failed to completely use the latent information in cloud/shadow areas. In this study, we propose a blind cloud/shadow removal method for time-series remote sensing images by unifying cloud/shadow detection and removal together. First, we decompose the degraded image into low-rank clean image (surface-reflected) component and sparse (cloud/shadow) component, which can simultaneously and completely use the underlying characteristics of these two components. Meanwhile, the spatial-spectral total variation regularization is introduced to promote the spatial-spectral continuity of the cloud/shadow component. Second, the cloud/shadow locations are detected from the sparse component using a threshold method. Finally, we adopt the cloud/shadow detection results to guide the information compensation from the original observed images to better preserve the information in cloud/shadow-free locations. The problem of the proposed model is efficiently addressed using the alternating direction method of multipliers. Both simulated and real datasets are performed to demonstrate the effectiveness of our method for cloud/shadow detection and removal when compared with other state-of-the-art methods.

Y. Chen, W. He, N. Yokoya, and T.-Z. Huang, ”Hyperspectral image restoration using weighted group sparsity regularized low-rank tensor decomposition,” IEEE Trans. Cybernetics, (accepted for publication), 2019.
PDF Quick Abstract
Abstract: Mixed noise (such as Gaussian, impulse, stripe, and deadline noises) contamination is a common phenomenon in hyperspectral imagery (HSI), greatly degrading visual quality and affecting subsequent processing accuracy. By encoding sparse prior to the spatial or spectral difference images, total variation (TV) regularization is an efficient tool for removing the noises. However, the previous TV term cannot maintain the shared group sparsity pattern of the spatial difference images of different spectral bands. To address this issue, this study proposes a group sparsity regularization of the spatial difference images for HSI restoration. Instead of using L1 or L2-norm (sparsity) on the difference image itself, we introduce a weighted L2,1-norm to constrain the spatial difference image cube, efficiently exploring the shared group sparse pattern. Moreover, we employ the well-known low-rank Tucker decomposition to capture the global spatial-spectral correlation from three HSI dimensions. To summarize, a weighted group sparsity regularized low-rank tensor decomposition (LRTDGS) method is presented for HSI restoration. An efficient augmented Lagrange multiplier algorithm is employed to solve the LRTDGS model. The superiority of this method for HSI restoration is demonstrated by a series of experimental results from both simulated and real data, as compared to other state-of-the-art TV regularized low-rank matrix/tensor decomposition methods.

W. He, N. Yokoya, L. Yuan, and Q. Zhao, ”Remote sensing image reconstruction using tensor ring completion and total-variation,” IEEE Trans. Geosci. Remote Sens., (accepted for publication), 2019.
PDF Quick Abstract
Abstract:Time-series remote sensing (RS) images are often corrupted by various types of missing information such as dead pixels, clouds, and cloud shadows that significantly influence the subsequent applications. In this paper, we introduce a new low-rank tensor decomposition model, termed tensor ring (TR) decomposition, to the analysis of RS datasets and propose a TR completion method for the missing information reconstruction. The proposed TR completion model has the ability to utilize the low-rank property of time-series RS images from different dimensions. To furtherly explore the smoothness of the RS image spatial information, total-variation regularization is also incorporated into the TR completion model. The proposed model is efficiently solved using two algorithms, the augmented Lagrange multiplier (ALM) and the alternating least square (ALS) methods. The simulated and real data experiments show superior performance compared to other state-of-the-art low-rank related algorithms.

Y. Xu, B. Du, L. Zhang, D. Cerra, M. Pato, E. Carmona, S. Prasad, N. Yokoya, R. Hansch, and B. Le Saux, ”Advanced multi-sensor optical remote sensing for urban land use and land cover classification: Outcome of the 2018 IEEE GRSS Data Fusion Contest,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., (accepted for publication), 2019.
PDF Quick Abstract
Abstract:

B. Adriano, J. Xia, G. Baier, N. Yokoya, S. Koshimura, ”Multi-source data fusion based on ensemble learning for rapid building damage mapping during the 2018 Sulawesi Earthquake and Tsunami in Palu, Indonesia,” Remote Sensing, vol. 11, no. 7, p. 886, 2019.
PDF Quick Abstract
Abstract: This work presents a detailed analysis of building damage recognition, employing multi-source data fusion and ensemble learning algorithms for rapid damage mapping tasks. A damage classification framework is introduced and tested to categorize the building damage following the recent 2018 Sulawesi earthquake and tsunami. Three robust ensemble learning classifiers were investigated for recognizing building damage from SAR and optical remote sensing datasets and their derived features. The contribution of each feature dataset was also explored, considering different combinations of sensors as well as their temporal information. SAR scenes acquired by the ALOS-2 PALSAR-2 and Sentinel-1 sensors were used. The optical Sentinel-2 and PlanetScope sensors were also included in this study. A non-local filter in the preprocessing phase was used to enhance the SAR features. Our results demonstrated that the canonical correlation forests classifier performs better in comparison to the other classifiers. In the data fusion analysis, DEM- and SAR-derived features contributed the most in the overall damage classification. Our proposed mapping framework successfully classifies four levels of building damage (with overall accuracy > 90%, average accuracy > 67%). The proposed framework learned the damage patterns from a limited available human-interpreted building damage annotation and expands this information to map a larger affected area. This process including pre- and post-processing phases were completed in about 3 hours after acquiring all raw datasets.

P. Ghamisi, B. Rasti, N. Yokoya, Q. Wang, B. Höfle, L. Bruzzone, F. Bovolo, M. Chi, K. Anders, R. Gloaguen, P. M. Atkinson, and J. A. Benedikt, ”Multisource and multitemporal data fusion in remote sensing,” IEEE Geoscience and Remote Sensing Magazine, vol. 7, no. 1, pp. 6-39, 2019.
PDF Quick Abstract
Abstract: The sharp and recent increase in the availability of data captured by different sensors combined with their considerably heterogeneous natures poses a serious challenge for the effective and efficient processing of remotely sensed data. Such an increase in remote sensing and ancillary datasets, however, opens up the possibility of utilizing multimodal datasets in a joint manner to further improve the performance of the processing approaches with respect to the application at hand. Multisource data fusion has, therefore, received enormous attention from researchers worldwide for a wide variety of applications. Moreover, thanks to the revisit capability of several spaceborne sensors, the integration of the temporal information with the spatial and/or spectral/backscattering information of the remotely sensed data is possible and helps to move from a representation of 2D/3D data to 4D data structures, where the time variable adds new information as well as challenges for the information extraction algorithms. There are a huge number of research works dedicated to multisource and multitemporal data fusion, but the methods for the fusion of different modalities have expanded in different paths according to each research community. This paper brings together the advances of multisource and multitemporal data fusion approaches with respect to different research communities and provides a thorough and discipline-specific starting point for researchers at different levels (i.e., students, researchers, and senior researchers) willing to conduct novel investigations on this challenging topic by supplying sufficient detail and references. More specifically, this paper provides a bird's-eye view of many important contributions specifically dedicated to the topics of pansharpening and resolution enhancement, point cloud data fusion, hyperspectral and LiDAR data fusion, multitemporal data fusion, as well as big data and social media. In addition, the main challenges and possible future research for each section are outlined and discussed.

T. D. Pham, N. Yokoya, D. T. Bui, K. Yoshino, and D. A. Friess, ”Remote sensing approaches for monitoring mangrove species, structure and biomass: opportunities and challenges,” Remote Sensing, vol. 11, no. 3, pp. 230, 2019.
PDF Quick Abstract
Abstract: The mangrove ecosystem plays a vital role in the global carbon cycle, by reducing greenhouse gas emissions and mitigating the impacts of climate change. However, mangroves have been lost worldwide, resulting in substantial carbon stock losses. Additionally, some aspects of the mangrove ecosystem remain poorly characterized compared to other forest ecosystems due to practical difficulties in measuring and monitoring mangrove biomass and their carbon stocks. Without a quantitative method for effectively monitoring biophysical parameters and carbon stocks in mangroves, robust policies and actions for sustainably conserving mangroves in the context of climate change mitigation and adaptation are more difficult. In this context, remote sensing provides an important tool for monitoring mangroves and identifying attributes such as species, biomass, and carbon stocks. A wide range of studies is based on optical imagery (aerial photography, multispectral, and hyperspectral) and synthetic aperture radar (SAR) data. Remote sensing approaches have been proven effective for mapping mangrove species, estimating their biomass, and assessing changes in their extent. This review provides an overview of the techniques that are currently being used to map various attributes of mangroves, summarizes the studies that have been undertaken since 2010 on a variety of remote sensing applications for monitoring mangroves, and addresses the limitations of these studies. We see several key future directions for the potential use of remote sensing techniques combined with machine learning techniques for mapping mangrove areas and species, and evaluating their biomass and carbon stocks.

D. Hong, N. Yokoya, J. Chanussot, and X. X. Zhu, ”CoSpace: Common subspace learning from hyperspectral-multispectral correspondences,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 7, pp. 4349-4359, 2019.
PDF Quick Abstract
Abstract: With a large amount of open satellite multispectral imagery (e.g., Sentinel-2 and Landsat-8), considerable attention has been paid to global multispectral land cover classification. However, its limited spectral information hinders further improving the classification performance. Hyperspectral imaging enables discrimination between spectrally similar classes but its swath width from space is narrow compared to multispectral ones. To achieve accurate land cover classification over a large coverage, we propose a cross-modality feature learning framework, called common subspace learning (CoSpace), by jointly considering subspace learning and supervised classification. By locally aligning the manifold structure of the two modalities, CoSpace linearly learns a shared latent subspace from hyperspectral-multispectral (HS-MS) correspondences. The multispectral out-of-samples can be then projected into the subspace, which are expected to take advantages of rich spectral information of the corresponding hyperspectral data used for learning, and thus leads to a better classification. Extensive experiments on two simulated HS-MS datasets (University of Houston and Chikusei), where HS-MS data sets have trade-offs between coverage and spectral resolution, are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous state-of-the-art methods.

D. Hong, N. Yokoya, N. Ge, J. Chanussot, and X. X. Zhu, ”Learnable manifold alignment (LeMA) : A semi-supervised cross-modality learning framework for land cover and land use classification,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 147, pp. 193-205, 2019.
PDF Quick Abstract
Abstract: In this paper, we aim at tackling a general but interesting cross-modality feature learning question in remote sensing community - can a limited amount of highly-discriminative (e.g., hyperspectral) training data improve the performance of a classification task using a large amount of poorly-discriminative (e.g., multispectral) data? Traditional semi-supervised manifold alignment methods do not perform sufficiently well for such problems, since the hyperspectral data is very expensive to be largely collected in a trade-off between time and efficiency, compared to the multispectral data. To this end, we propose a novel semi-supervised cross-modality learning framework, called learnable manifold alignment (LeMA). LeMA learns a joint graph structure directly from the data instead of using a given fixed graph defined by a Gaussian kernel function. With the learned graph, we can further capture the data distribution by graph-based label propagation, which enables finding a more accurate decision boundary. Additionally, an optimization strategy based on the alternating direction method of multipliers (ADMM) is designed to solve the proposed model. Extensive experiments on two hyperspectral-multispectral datasets demonstrate the superiority and effectiveness of the proposed method in comparison with several state-of-the-art methods.

D. Hong, N. Yokoya, J. Chanussot, and X. X. Zhu, ”An augmented linear mixing model to address spectral variability for hyperspectral unmixing,” IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1923-1938, 2018.
PDF Quick Abstract
Abstract: Hyperspectral imagery collected from airborne or satellite sources inevitably suffers from spectral variability, making it difficult for spectral unmixing to accurately estimate abundance maps. The classical unmixing model, the linear mixing model (LMM), generally fails to handle this sticky issue effectively. To this end, we propose a novel spectral mixture model, called the augmented linear mixing model (ALMM), to address spectral variability by applying a data-driven learning strategy in inverse problems of hyperspectral unmixing. The proposed approach models the main spectral variability (i.e., scaling factors) generated by variations in illumination or typography separately by means of the endmember dictionary. It then models other spectral variabilities caused by environmental conditions (e.g., local temperature and humidity, atmospheric effects) and instrumental configurations (e.g., sensor noise), as well as material nonlinear mixing effects, by introducing a spectral variability dictionary. To effectively run the data-driven learning strategy, we also propose a reasonable prior knowledge for the spectral variability dictionary, whose atoms are assumed to be low-coherent with spectral signatures of endmembers, which leads to a well-known low-coherence dictionary learning problem. Thus, a dictionary learning technique is embedded in the framework of spectral unmixing so that the algorithm can learn the spectral variability dictionary and estimate the abundance maps simultaneously. Extensive experiments on synthetic and real datasets are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous state-of-the-art methods.

W. He and N. Yokoya, ”Multi-temporal Sentinel-1 and -2 data fusion for optical image simulation,” ISPRS International Journal of Geo-Information, vol. 7, no. 10: 389, 2018.
PDF Quick Abstract
Abstract: In this paper, we present the optical image simulation from synthetic aperture radar (SAR) data using deep learning based methods. Two models, i.e., optical image simulation directly from the SAR data and from multi-temporal SAR-optical data, are proposed to testify the possibilities. The deep learning based methods that we chose to achieve the models are a convolutional neural network (CNN) with a residual architecture and a conditional generative adversarial network (cGAN). We validate our models using the Sentinel-1 and -2 datasets. The experiments demonstrate that the model with multi-temporal SAR-optical data can successfully simulate the optical image, meanwhile, the model with simple SAR data as input failed. The optical image simulation results indicate the possibility of SAR-optical information blending for the subsequent applications such as large-scale cloud removal, and optical data temporal super-resolution. We also investigate the sensitivity of the proposed models against the training samples, and reveal possible future directions.

L. Guanter, M. Brell, J. C.-W. Chan, C. Giardino, J. Gomez-Dans, C. Mielke, F. Morsdorf, K. Segl, and N. Yokoya, ”Synergies of spaceborne imaging spectroscopy with other remote sensing approaches,” Surveys in Geophysics, pp. 1-31, 2018.
PDF Quick Abstract
Abstract: Imaging spectroscopy (IS), also commonly known as hyperspectral remote sensing, is a powerful remote sensing technique for the monitoring of the Earth’s surface and atmosphere. Pixels in optical hyperspectral images consist of continuous reflectance spectra formed by hundreds of narrow spectral channels, allowing an accurate representation of the surface composition through spectroscopic techniques. However, technical constraints in the definition of imaging spectrometers make spectral coverage and resolution to be usually traded by spatial resolution and swath width, as opposed to optical multispectral (MS) systems typically designed to maximize spatial and/or temporal resolution. This complementarity suggests that a synergistic exploitation of spaceborne IS and MS data would be an optimal way to fulfill those remote sensing applications requiring not only high spatial and temporal resolution data, but also rich spectral information. On the other hand, IS has been shown to yield a strong synergistic potential with non-optical remote sensing methods, such as thermal infrared (TIR) and light detection and ranging (LiDAR). In this contribution we review theoretical and methodological aspects of potential synergies between optical IS and other remote sensing techniques. The focus is put on the evaluation of synergies between spaceborne optical IS and MS systems because of the expected availability of the two types of data in the next years. Short reviews of potential synergies of IS with TIR and LiDAR measurements are also provided.

T.-Y. Ji, N. Yokoya, X. X. Zhu, and T.-Z. Huang, ”Non-local tensor completion for multitemporal remotely sensed images inpainting,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 6, pp. 3047-3061, 2018.
PDF Quick Abstract
Abstract: Remotely sensed images may contain some missing areas because of poor weather conditions and sensor failure. Information of those areas may play an important role in the interpretation of multitemporal remotely sensed data. The paper aims at reconstructing the missing information by a non-local low-rank tensor completion method (NL-LRTC). First, non-local correlations in the spatial domain are taken into account by searching and grouping similar image patches in a large search window. Then low-rankness of the identified 4-order tensor groups is promoted to consider their correlations in spatial, spectral, and temporal domains, while reconstructing the underlying patterns. Experimental results on simulated and real data demonstrate that the proposed method is effective both qualitatively and quantitatively. In addition, the proposed method is computationally efficient compared to other patch based methods such as the recent proposed PM-MTGSR method.

J. Xia, N. Yokoya, and A. Iwasaki, ”Fusion of hyperspectral and LiDAR data with a novel ensemble classifier,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 6, pp. 957-961, 2018.
PDF Quick Abstract
Abstract: Due to the development of sensors and data acquisition technology, the fusion of features from multiple sensors is a very hot topic. In this letter, the use of morphological features to fuse an HS image and a light detection and ranging (LiDAR)-derived digital surface model (DSM) is exploited via an ensemble classifier. In each iteration, we first apply morphological openings and closings with partial reconstruction on the first few principal components (PCs) of the HS and LiDAR datasets to produce morphological features to model spatial and elevation information for HS and LiDAR datasets. Second, three groups of features (i.e., spectral, morphological features of HS and LiDAR data) are split into several disjoint subsets. Third, data transformation is applied to each subset and the features extracted in each subset are stacked as the input of a random forest (RF) classifier. Three data transformation methods, including principal component analysis (PCA), linearity preserving projection (LPP), and unsupervised graph fusion (UGF) are introduced into the ensemble classification process. Finally, we integrate the classification results achieved at each step by a majority vote. Experimental results on co-registered HS and LiDAR-derived DSM demonstrate the effectiveness and potentialities of the proposed ensemble classifier.

P. Ghamisi and N. Yokoya, ”IMG2DSM: Height simulation from single imagery using conditional generative adversarial nets,” IEEE Geosci. Remote Sens. Lett., vol. 15, no. 5, pp. 794-798, 2018.
PDF Quick Abstract
Abstract: This paper proposes a groundbreaking approach in the remote sensing community to simulating digital surface model (DSM) from a single optical image. This novel technique uses conditional generative adversarial nets whose architecture is based on an encoder-decoder network with skip connections (generator) and penalizing structures at the scale of image patches (discriminator). The network is trained on scenes where both DSM and optical data are available to establish an image-to-DSM translation rule. The trained network is then utilized to simulate elevation information on target scenes where no corresponding elevation information exists. The capability of the approach is evaluated both visually (in terms of photo interpretation) and quantitatively (in terms of reconstruction errors and classification accuracies) on sub-decimeter spatial resolution datasets captured over Vaihingen, Potsdam, and Stockholm. The results confirm the promising performance of the proposed framework.

N. Yokoya, P. Ghamisi, J. Xia, S. Sukhanov, R. Heremans, I. Tankoyeu, B. Bechtel, B. Le Saux, G. Moser, and D. Tuia, ”Open data for global multimodal land use classification: Outcome of the 2017 IEEE GRSS Data Fusion Contest,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11, no. 5, pp. 1363-1377, 2018.

J. Xia, P. Ghamisi, N. Yokoya, and A. Iwasaki, ”Random forest ensembles and extended multi-extinction profiles for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 1, pp. 202-216, 2018.
PDF Quick Abstract
Abstract: Classification techniques for hyperspectral images based on random forest (RF) ensembles and extended multi-extinction profiles (EMEPs) are proposed as a means of improving performance. To this end, five strategies-bagging, boosting, random subspace, rotation-based, and boosted rotation-based---are used to construct the RF ensembles. Extinction profiles (EPs), which are based on an extrema-oriented connected filtering technique, are applied to the images associated with the first informative components extracted by independent component analysis, leading to a set of EMEPs. The effectiveness of the proposed method is investigated on two benchmark hyperspectral images, University of Pavia and Indian Pines. Comparative experimental evaluations reveal the superior performance of the proposed methods, especially those employing rotation-based and boosted rotation-based approaches. An additional advantage is that the CPU processing time is acceptable.

P. Ghamisi, N. Yokoya, J. Li, W. Liao, S. Liu, J. Plaza, B. Rasti, and A. Plaza, ”Advances in hyperspectral image and signal processing: a comprehensive overview of the state of the art,” IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 4, pp. 37-78, 2017.
PDF Quick Abstract
Abstract: Recent advances in airborne and spaceborne hyperspectral imaging technology have provided end users with rich spectral, spatial, and temporal information, which make a plethora of applications for the analysis of large areas of the Earth surface feasible. However, a huge number of factors, such as high dimensions and size of the hyperspectral data, the lack of training samples, mixed pixels, light scattering mechanisms in the acquisition process, and different atmospheric and geometric distortions, make such data inherently nonlinear and complex, which poses extreme challenges for existing methodologies to effectively process and analyze the data sets. Hence, rigorous and innovative methodologies are required for hyperspectral image and signal processing and have become a center of attention for researchers worldwide. This paper offers a comprehensive tutorial/overview focusing specifically on hyperspectral data analysis, which is categorized into seven broad topics: classification, spectral unmixing, dimensionality reduction, resolution enhancement, hyperspectral image denoising and restoration, change detection, and fast computing. For each topic, we provide a synopsis of the state-of-the-art approaches and numerical results for validation and evaluation of different methodologies, followed by a discussion of future challenges and research directions.

H. Zheng, P. Du, J. Chen, J. Xia, E. Li, Z. Xu, X. Li, and N. Yokoya, ”Performance evaluation of downscaling Sentinel-2 imagery for land use and land cover classification by spectral-spatial features,” Remote Sensing, vol. 9, no. 12: 1274, 2017.
PDF Quick Abstract
Abstract: Land Use and Land Cover (LULC) classification is vital for environmental and ecological applications. Sentinel-2 is a new generation land monitoring satellite with the advantages of novel spectral capabilities, wide coverage and fine spatial and temporal resolutions. The effects of different spatial resolution unification schemes and methods on LULC classification have been scarcely investigated for Sentinel-2. This paper bridged this gap by comparing the differences between upscaling and downscaling as well as different downscaling algorithms from the point of view of LULC classification accuracy. The studied downscaling algorithms include nearest neighbor resampling and five popular pansharpening methods, namely, Gram-Schmidt (GS), nearest neighbor diffusion (NNDiffusion), PANSHARP algorithm proposed by Y. Zhang, wavelet transformation fusion (WTF) and high-pass filter fusion (HPF). Two spatial features, textural metrics derived from Grey-Level-Co-occurrence Matrix (GLCM) and extended attribute profiles (EAPs), are investigated to make up for the shortcoming of pixel-based spectral classification. Random forest (RF) is adopted as the classifier. The experiment was conducted in Xitiaoxi watershed, China. The results demonstrated that downscaling obviously outperforms upscaling in terms of classification accuracy. For downscaling, image sharpening has no obvious advantages than spatial interpolation. Different image sharpening algorithms have distinct effects. Two multiresolution analysis (MRA)-based methods, i.e., WTF and HFP, achieve the best performance. GS achieved a similar accuracy with NNDiffusion and PANSHARP. Compared to image sharpening, the introduction of spatial features, both GLCM and EAPs can greatly improve the classification accuracy for Sentinel-2 imagery. Their effects on overall accuracy are similar but differ significantly to specific classes. In general, using the spectral bands downscaled by nearest neighbor interpolation can meet the requirements of regional LULC applications, and the GLCM and EAPs spatial features can be used to obtain more precise classification maps.

J. Xia, N. Yokoya, and A. Iwasaki, ”Classification of large-sized hyperspectral imagery using fast machine learning algorithms,” Journal of Applied Remote Sensing, vol. 11, no. 3, 035005, 2017.
PDF Quick Abstract
Abstract: We present a framework of fast machine learning algorithms in the context of large-sized hyperspectral images classification from the theoretical to a practical viewpoint. In particular, we assess the performance of random forest (RF), rotation forest (RoF), and extreme learning machine (ELM) and the ensembles of RF and ELM. These classifiers are applied to two large-sized hyperspectral images and compared to the support vector machines. To give the quantitative analysis, we pay attention to comparing these methods when working with high input dimensions and a limited/sufficient training set. Moreover, other important issues such as the computational cost and robustness against the noise are also discussed.

N. Yokoya, C. Grohnfeldt, and J. Chanussot, ”Hyperspectral and multispectral data fusion: a comparative review of the recent literature,” IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 2, pp. 29-56, 2017.
PDF Code Quick Abstract
Abstract: In recent years, enormous efforts have been made to design image processing algorithms to enhance the spatial resolution of hyperspectral (HS) imagery. One of the most commonly addressed problems is the fusion of HS data with higher-spatial-resolution multispectral (MS) data. Various techniques have been proposed to solve this data fusion problem based on different theories including component substitution, multiresolution analysis, spectral unmixing, and Bayesian probability. This paper presents a comparative review of those HS-MS fusion techniques with extensive experiments. Ten state-of-the-art HS-MS fusion methods are compared by assessing their fusion performance both quantitatively and visually. Eight data sets featuring different geographical and sensor characteristics are used in the experiments to evaluate the generalizability and versatility of the fusion algorithms. To maximize the fairness and transparency of this comparison, publicly available source codes are used, and parameters are individually tuned for maximum performance. Additionally, the impact of spatial resolution enhancement on classification is investigated. Robustness against various factors characterizing the HS-MS fusion problem is systematically analyzed for all methods under comparison. The algorithm characteristics are summarized, and methods with high general versatility are clarified. The paper also provides possible future directions for the development of HS-MS fusion.

N. Yokoya, ”Texture-guided multisensor superresolution for remotely sensed images,” Remote Sensing, vol. 9, no. 4: 316, 2017.
PDF Quick Abstract
Abstract: This paper presents a novel technique, namely texture-guided multisensor superresolution (TGMS), for fusing a pair of multisensor multiresolution images to enhance the spatial resolution of a lower-resolution data source. TGMS is based on multiresolution analysis, taking object structures and image textures in the higher-resolution image into consideration. TGMS is designed to be robust against misregistration and the resolution ratio and applicable to a wide variety of multisensor superresolution problems in remote sensing. The proposed methodology is applied to six different types of multisensor superresolution, which fuse the following image pairs: multispectral and panchromatic images, hyperspectral and panchromatic images, hyperspectral and multispectral images, optical and synthetic aperture radar images, thermal-hyperspectral and RGB images, and digital elevation model and multispectral images. The experimental results demonstrate the effectiveness and high general versatility of TGMS.

D. Hong, N. Yokoya, and X. X. Zhu, ”Learning a robust local manifold representation for hyperspectral dimensionality reduction,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10, no. 6, pp. 2960-2975, 2017.
PDF Quick Abstract
Abstract: Local manifold learning has been successfully applied to hyperspectral dimensionality reduction in order to embed nonlinear and non-convex manifolds in the data. Local manifold learning is mainly characterized by affinity matrix construction, which is composed of two steps: neighbor selection and computation of affinity weights. There is a challenge in each step: (1) neighbor selection is sensitive to complex spectral variability due to non-uniform data distribution, illumination variations, and sensor noise; (2) the computation of affinity weights is challenging due to highly correlated spectral signatures in the neighborhood. To address the two issues, in this work a novel manifold learning methodology based on locally linear embedding (LLE) is proposed through learning a robust local manifold representation (RLMR). More specifically, a hierarchical neighbor selection (HNS) is designed to progressively eliminate the effects of complex spectral variability using joint normalization (JN) and to robustly compute affinity (or reconstruction) weights reducing multicollinearity via refined neighbor selection (RNS). Additionally, an idea that combines spatial-spectral information is introduced into the proposed manifold learning methodology to further improve the robustness of affinity calculations. Classification is explored as a potential application for validating the proposed algorithm. Classification accuracy in the use of different dimensionality reduction methods is evaluated and compared, while two kinds of strategies are applied in selecting the training and test samples: random sampling and region-based sampling. Experimental results show the classification accuracy obtained by the proposed method is superior to those state-of-the-art dimensionality reduction methods.

N. Yokoya, X. X. Zhu, and A. Plaza, ”Multisensor coupled spectral unmixing for time-series analysis,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2842-2857, 2017.
PDF Quick Abstract
Abstract: We present a new framework, called multisensor coupled spectral unmixing (MuCSUn), that solves unmixing problems involving a set of multisensor time-series spectral images in order to understand dynamic changes of the surface at a subpixel scale. The proposed methodology couples multiple unmixing problems based on regularization on graphs between the time-series data to obtain robust and stable unmixing solutions beyond data modalities due to different sensor characteristics and the effects of non-optimal atmospheric correction. Atmospheric normalization and cross-calibration of spectral response functions are integrated into the framework as a preprocessing step. The proposed methodology is quantitatively validated using a synthetic dataset that includes seasonal and trend changes on the surface and the residuals of non-optimal atmospheric correction. The experiments on the synthetic dataset clearly demonstrate the efficacy of MuCSUn and the importance of the preprocessing step. We further apply our methodology to a real time-series data set composed of 11 Hyperion and 22 Landsat-8 images taken over Fukushima, Japan, from 2011 to 2015. The proposed methodology successfully obtains robust and stable unmixing results and clearly visualizes class-specific changes at a subpixel scale in the considered study area.

J. Xia, N. Yokoya, and A. Iwasaki, ”Hyperspectral image classification with canonical correlation forests,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 1, pp. 421-431, 2017.
PDF Quick Abstract
Abstract: Multiple classifier systems or ensemble learning is an effective tool for providing accurate classification results of hyperspectral remote sensing images. Two well-known ensemble learning classifiers for hyperspectral data are random forest (RF) and rotation forest (RoF). In this paper, we proposed to use a novel decision tree (DT) ensemble method, namely, canonical correlation forest (CCF). More specifically, several individual canonical correlation trees (CCTs) that are binary DTs, which use canonical correlation components for the hyperplane splitting, are used to construct the CCF. Additionally, we adopt the projection bootstrap technique in CCF, in which the full spectral bands are retained for split selection in the projected space. The techniques aforementioned allow the CCF to improve the accuracy of member classifiers and diversity within the ensemble. Furthermore, the CCF is extended to the spectral–spatial frameworks that incorporate Markov random fields, extended multiattribute profiles (EMAPs), and the ensemble of independent component analysis and rolling guidance filter (E-ICA-RGF). Experimental results on six hyperspectral data sets are used to indicate the comparative effectiveness of the proposed method, in terms of accuracy and computational complexity, compared with RF and RoF, and it turns out that CCF is a promising approach for hyperspectral image classification not only with spectral information but also in the spectral–spatial frameworks.

N. Yokoya, J. C. W. Chan, and K. Segl, ”Potential of resolution-enhanced hyperspectral data for mineral mapping using simulated EnMAP and Sentinel-2 images,” Remote Sensing, vol. 8, no. 3: 172, 2016.
PDF Quick Abstract
Abstract: Spaceborne hyperspectral images are useful for large scale mineral mapping. Acquired at a ground sampling distance (GSD) of 30 m, the Environmental Mapping and Analysis Program (EnMAP) will be capable of putting many issues related to environment monitoring and resource exploration in perspective with measurements in the spectral range between 420 and 2450 nm. However, a higher spatial resolution is preferable for many applications. This paper investigates the potential of fusion-based resolution enhancement of hyperspectral data for mineral mapping. A pair of EnMAP and Sentinel-2 images is generated from a HyMap scene over a mining area. The simulation is based on well-established sensor end-to-end simulation tools. The EnMAP image is fused with Sentinel-2 10-m-GSD bands using a matrix factorization method to obtain resolution-enhanced EnMAP data at a 10 m GSD. Quality assessments of the enhanced data are conducted using quantitative measures and continuum removal and both show that high spectral and spatial fidelity are maintained. Finally, the results of spectral unmixing are compared with those expected from high-resolution hyperspectral data at a 10 m GSD. The comparison demonstrates high resemblance and shows the great potential of the resolution enhancement method for EnMAP type data in mineral mapping.

M.A. Veganzones, M. Simoes, G. Licciardi, N. Yokoya, J.M. Bioucas-Dias, and J. Chanussot, “Hyperspectral super-resolution of locally low rank images from complementary multisource data,” IEEE Transactions on Image Processing, vol. 25, no. 1, pp. 274-288, 2016.
PDF Quick Abstract
Abstract: Remote sensing hyperspectral images (HSI) are quite often low rank, in the sense that the data belong to a low dimensional subspace/manifold. This has been recently exploited for the fusion of low spatial resolution HSI with high spatial resolution multispectral images (MSI) in order to obtain super-resolution HSI. Most approaches adopt an unmixing or a matrix factorization perspective. The derived methods have led to state-of-the-art results when the spectral information lies in a low dimensional subspace/manifold. However, if the subspace/manifold dimensionality spanned by the complete data set is large, i.e., larger than the number of multispectral bands, the performance of these methods decrease mainly because the underlying sparse regression problem is severely ill-posed. In this paper, we propose a local approach to cope with this difficulty. Fundamentally, we exploit the fact that real world HSI are locally low rank, that is, pixels acquired from a given spatial neighborhood span a very low dimensional subspace/manifold, i.e., lower or equal than the number of multispectral bands. Thus, we propose to partition the image into patches and solve the data fusion problem independently for each patch. This way, in each patch the subspace/manifold dimensionality is low enough such that the problem is not ill-posed anymore. We propose two alternative approaches to define the hyperspectral super-resolution via local dictionary learning using endmember induction algorithms (HSR-LDL-EIA). We also explore two alternatives to define the local regions, using sliding windows and binary partition trees. The effectiveness of the proposed approaches is illustrated with synthetic and semi real data.

L. Loncan, L. B. Almeida, J. Bioucas Dias, X. Briottet, J. Chanussot, N. Dobigeon, S. Fabre, W. Liao, G. A. Licciardi, M. Simoes, J. Y. Tourneret, M. A. Veganzones, G. Vivone, Q. Wei, and N. Yokoya, “Hyperspectral pansharpening: a review,” IEEE Geoscience and Remote Sensing Magazine, vol. 3, no. 3, pp. 27-46, 2015.
PDF Quick Abstract
Abstract: Pansharpening aims at fusing a panchromatic image with a multispectral one, to generate an image with the high spatial resolution of the former and the high spectral resolution of the latter. In the last decade, many algorithms have been presented in the literature for pansharpening using multispectral data. With the increasing availability of hyperspectral systems, these methods are now being adapted to hyperspectral images. In this work, we compare new pansharpening techniques designed for hyperspectral data with some of the state of the art methods for multispectral pansharpening, which have been adapted for hyperspectral data. Eleven methods from different classes (component substitution, multiresolution analysis, hybrid, Bayesian and matrix factorization) are analyzed. These methods are applied to three datasets and their effectiveness and robustness are evaluated with widely used performance indicators. In addition, all the pansharpening techniques considered in this paper have been implemented in a MATLAB toolbox that is made available to the community.

T. Matsuki, N. Yokoya, and A. Iwasaki, ”Hyperspectral tree species classification of Japanese complex mixed forest with the aid of LiDAR data,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 5, pp. 2177-2187, 2015.
PDF Quick Abstract
Abstract: The classification of tree species in forests is an important task for forest maintenance and management. With the increase in the spatial resolution of remote sensing imagery, individual tree classification is the next target of research area for the forest inventory. In this work, we propose a methodology involving the combination of hyperspectral and LiDAR data for individual tree classification, which can be extended to areas of shadow caused by the illumination of tree crowns with sunlight. To remove the influence of shadows in hyperspectral data, an unmixing-based correction is applied as preprocessing. Spectral features of trees are obtained by principal component analysis of the hyperspectral data. The sizes and shapes of individual trees are derived from the LiDAR data after individual tree crown delineation. Both spectral and tree-crown features are combined and input into a support vector machine classifier pixel-by-pixel. This procedure is applied to data taken over Tama Forest Science Garden in Tokyo, Japan, to classify it into 16 classes of tree species. It is found that both shadow correction and tree-crown information improve the classification performance, which is further improved by postprocessing based on tree-crown information derived from the LiDAR data. Regarding the classification results in the case of 10% training data, when using the random sampling of pixels to select training samples, a classification accuracy of 82% was obtained, while the use of reference polygons as a more practical means of sample selection reduced the accuracy to 71%. These values are respectively 21.5% and 9% higher than those are obtained using hyperspectral data only.

N. Yokoya and A. Iwasaki, ”Object detection based on sparse representation and Hough voting for optical remote sensing imagery,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 5, pp. 2053-2062, 2015.
PDF Quick Abstract
Abstract: We present a novel method for detecting instances of an object class or specific object in high-spatial-resolution optical remote sensing images. The proposed method integrates sparse representations for local-feature detection into generalized-Hough-transform object detection. Object parts are detected via class-specific sparse image representations of patches using learned target and background dictionaries, and their co-occurrence is spatially integrated by Hough voting, which enables object detection. We aim to efficiently detect target objects using a small set of positive training samples by matching essential object parts with a target dictionary while the residuals are explained by a background dictionary. Experimental results show that the proposed method achieves state-of-the-art performance for several examples including object-class detection and specific-object identification.

M. Kokawa, N. Yokoya, H. Ashida, J. Sugiyama, M. Tsuta, M. Yoshimura, K. Fujita, M. Shibata, ”Visualization of gluten, starch, and butter in pie pastry by fluorescence fingerprint imaging,” Food and Bioprocess Technology, online ISSN: 1935-5149, Sep. 2014.
PDF Quick Abstract
Abstract: The distribution of starches, proteins, and fat in baked foods determine their texture and palatability, and there is a great demand for techniques to visualize the distributions of these constituents. In this study, the distributions of gluten, starch, and butter in pie pastry were visualized without any staining, by using the fluorescence fingerprint (FF). The FF, also known as the excitation–emission matrix (EEM), is a set of fluorescence spectra acquired at consecutive excitation wavelengths. Fluorescence images of the sample were acquired with excitation and emission wavelengths in the ranges of 270–320 and 350–420 nm, respectively, at 10-nm increments. The FFs of each pixel were unmixed into the FFs and abundances of five constituents, gluten, starch, butter, ferulic acid, and the microscope slide, by using the least squares method coupled with constraints of non-negativity, full additivity, and quantum restraint on the abundances of the slide glass. The calculated abundances of butter, starch, and gluten at each pixel were converted to shades of red (R), green (G), and blue (B), respectively, and RGB images showing the distribution of these three constituents was composited. The composited images showed high correspondence with the images acquired with the conventional staining method. Furthermore, the ratio of gluten, starch, and butter in short pastry was calculated from the abundance images. The calculated ratio was 16.6:37.6:45.8, which was very close to the actual ratio of 12.7:38.8:48.5, and further proved the accuracy of this imaging method.

N. Yokoya, S. Nakazawa, T. Matsuki, and A. Iwasaki, ”Fusion of hyperspectral and LiDAR data for landscape visual quality assessment,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 2419-2425, 2014.
PDF Quick Abstract
Abstract: Landscape visual quality is an important factor associated with daily experiences and influences our quality of life. In this work, the authors present a method of fusing airborne hyperspectral and mapping light detection and ranging (LiDAR) data for landscape visual quality assessment. From the fused hyperspectral and LiDAR data, classification and depth images at any location can be obtained, enabling physical features such as land-cover properties and openness to be quantified. The relationship between physical features and human landscape preferences is learned using least absolute shrinkage and selection operator (LASSO) regression. The proposed method is applied to the hyperspectral and LiDAR datasets provided for the 2013 IEEE GRSS Data Fusion Contest. The results showed that the proposed method successfully learned a human perception model that enables the prediction of landscape visual quality at any viewpoint for a given demographic used for training. This work is expected to contribute to automatic landscape assessment and optimal spatial planning using remote sensing data.

N. Yokoya, J. Chanussot, and A. Iwasaki, ”Nonlinear unmixing of hyperspectral data using semi-nonnegative matrix factorization,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 2, pp. 1430-1437, 2014.
PDF Quick Abstract
Abstract: Nonlinear spectral mixture models have recently received particular attention in hyperspectral image processing. In this work, we present a novel optimization method of nonlinear unmixing based on a generalized bilinear model (GBM), which considers the second-order scattering of photons in a spectral mixture model. Semi-nonnegative matrix factorization (Semi-NMF) is used for the optimization to process a whole image in matrix form. When endmember spectra are given, the optimization of abundance and interaction abundance fractions converges to a local optimum by alternating update rules with simple implementation. The proposed method is evaluated using synthetic datasets considering its robustness for the accuracy of endmember extraction and spectral complexity, and shows smaller errors in abundance fractions than conventional methods. GBM-based unmixing using Semi-NMF is applied to the analysis of an airborne hyperspectral image taken over an agricultural field with many endmembers, it visualizes the impact of a nonlinear interaction on abundance maps at reasonable computational cost.

N. Yokoya, N. Mayumi, and A. Iwasaki, ”Cross-calibration for data fusion of EO-1/Hyperion and Terra/ASTER,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 2, pp. 419-426, 2013.
PDF Quick Abstract
Abstract: The data fusion of low spatial-resolution hyperspectral and high spatial-resolution multispectral images enables the production of high spatial-resolution hyperspectral data with small spectral distortion. EO-1/Hyperion is the world’s first hyperspectral sensor. It was launched in 2001 and has a similar orbit to Terra/ASTER. In this work, we apply hyperspectral and multispectral data fusion to EO-1/Hyperion and Terra/ASTER datasets by the preprocessing of datasets and the onboard cross-calibration of sensor characteristics. The relationship of the spectral response function is determined by convex optimization by comparing hyperspectral and multispectral images over the same spectral range. After accurate image registration, the relationship of the point spread function is obtained by estimating a matrix that acts as Gaussian blur filter between two images. Two pansharpening-based methods and one unmixing-based method are adopted for hyperspectral and multispectral data fusion and their properties are investigated.

N. Yokoya, T. Yairi, and A. Iwasaki, ”Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 2, pp. 528-537, 2012.
PDF Quick Abstract
Abstract: Coupled non-negative matrix factorization (CNMF) unmixing is proposed for the fusion of low-spatial-resolution hyperspectral and high-spatial-resolution multispectral data to produce fused data with high spatial and spectral resolutions. Both hyperspectral and multispectral data are alternately unmixed into endmember and abundance matrices by the CNMF algorithm based on a linear spectral mixture model. Sensor observation models that relate the two data are built into the initialization matrix of each NMF unmixing procedure. This algorithm is physically straightforward and easy to implement owing to its simple update rules. Simulations with various image datasets demonstrate that the CNMF algorithm can produce high-quality fused data both in terms of spatial and spectral domains, which contributes to the accurate identification and classification of materials observed at a high spatial resolution.
MATLAB Python Readme
What is it?
-----------

CNMF is an algorithm to fuse hyperspectral data with either multispectral or panchromatic data to obtain high-resolution hyperspectral data. This set of MATLAB or Python codes implements the methods described in

[1] N. Yokoya, T. Yairi, and A. Iwasaki, "Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion," IEEE Trans. Geosci. Remote Sens., vol. 50, no. 2, pp. 528-537, 2012.

[2] N. Yokoya, N. Mayumi, and A. Iwasaki, "Cross-calibration for data fusion of EO-1/Hyperion and Terra/ASTER," IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 2, pp. 419-426, 2013.

[3] N. Yokoya, T. Yairi, and A. Iwasaki, "Hyperspectral, multispectral, and panchromatic data fusion based on non-negative matrix factorization," Proc. WHISPERS, Lisbon, Portugal, Jun. 6-9, 2011.

See the file LICENSE for copying conditions.

Please kindly report any suggestions or corrections to naoto.yokoya@riken.jp

Release date
------------

April 11, 2016

How to use it?
--------------

The MATLAB version includes three folders:

'Demo': a demo program of CNMF using synthetic data.
'CNMF': the CNMF source code.
'Quality_Indices': source codes for quality indices.

The Python version includes two Python files:

'Demo_CNMF': a demo program of CNMF using synthetic data.
'CNMF': the CNMF source code.

System-specific notes
---------------------

The MATLAB version was tested using MATLAB R2015b on Windows 7 machines.
This Python version was tested using Python 2.7.5 on Windows 7/8 machines and Python 2.7.3 on a Mac OS X 10.9.5 machine.

Licensing
---------

Copyright (C) 2016 Naoto Yokoya

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see .

Contact Information:
--------------------

Naoto Yokoya: yokoya@k.u-tokyo.ac.jp

Naoto Yokoya is with The University of Tokyo, Japan.

N. Yokoya, N. Miyamura, and A. Iwasaki, “Detection and correction of spectral and spatial misregistrations for hyperspectral data using phase correlation method,” Applied Optics, vol. 49, no. 24, pp. 4568-4575, 2010.
PDF Quick Abstract
Abstract: Hyperspectral imaging sensors suffer from spectral and spatial misregistrations. These artifacts prevent the accurate acquisition of spectra and thus reduce classification accuracy. The main objective of this work is to detect and correct spectral and spatial misregistrations of hyperspectral images. The Hyperion visible near-infrared (VNIR) subsystem is used as an example. An image registration method based on phase correlation demonstrates the precise detection of the spectral and spatial misregistrations. Cubic spline interpolation using estimated properties makes it possible to modify the spectral signatures. The accuracy of the proposed postlaunch estimation of the Hyperion characteristics is comparable to that of the prelaunch measurements, which enables the precise onboard calibration of hyperspectral sensors.

Leading conferences

W. Gan, F. Liu, H. Xu, and N. Yokoya, ”GaussianOcc: Fully self-supervised and efficient 3D occupancy estimation with Gaussian splatting,” Proc. ICCV, 2025.
PDF Code Quick Abstract
Abstract: We introduce GaussianOcc, a systematic method that investigates Gaussian splatting for fully self-supervised and efficient 3D occupancy estimation in surround views. First, traditional methods for self-supervised 3D occupancy estimation still require ground truth 6D ego pose from sensors during training. To address this limitation, we propose Gaussian Splatting for Projection (GSP) module to provide accurate scale information for fully self-supervised training from adjacent view projection. Additionally, existing methods rely on volume rendering for final 3D voxel representation learning using 2D signals (depth maps and semantic maps), which is time-consuming and less effective. We propose Gaussian Splatting from Voxel space (GSV) to leverage the fast rendering properties of Gaussian splatting. As a result, the proposed GaussianOcc method enables fully self-supervised (no ground-truth ego pose) 3D occupancy estimation in competitive performance with low computational cost (2.7 times faster in training and 5 times faster in rendering). The relevant code is available here.

Z. Wu, Y. Chen, N. Yokoya, and W. He, ”MP-HSIR: A multi-prompt framework for universal hyperspectral image restoration,” Proc. ICCV, 2025.
PDF Code Quick Abstract
Abstract: Hyperspectral images (HSIs) often suffer from diverse and unknown degradations during imaging, leading to severe spectral and spatial distortions. Existing HSI restoration methods typically rely on specific degradation assumptions, limiting their effectiveness in complex scenarios. In this paper, we propose MP-HSIR, a novel multi-prompt framework that effectively integrates spectral, textual, and visual prompts to achieve universal HSI restoration across diverse degradation types and intensities. Specifically, we develop a prompt-guided spatial-spectral transformer, which incorporates spatial self-attention and a prompt-guided dual-branch spectral self-attention. Since degradations affect spectral features differently, we introduce spectral prompts in the local spectral branch to provide universal low-rank spectral patterns as prior knowledge for enhancing spectral reconstruction. Furthermore, the text-visual synergistic prompt fuses high-level semantic representations with fine-grained visual features to encode degradation information, thereby guiding the restoration process. Extensive experiments on 9 HSI restoration tasks, including all-in-one scenarios, generalization tests, and real-world cases, demonstrate that MP-HSIR not only consistently outperforms existing all-in-one methods but also surpasses state-of-the-art task-specific approaches across multiple tasks. The code and models will be released at this https URL.

C. Ning, W. Xuan, W. Gan, and N. Yokoya, ”LR²Depth: Large-region aggregation at low resolution for efficient monocular depth estimation,” Proc. IROS (oral), 2025.
PDF Code Quick Abstract
Abstract: Monocular depth estimation (MDE) is crucial for various computer vision applications, but existing methods often struggle to balance inference speed and accuracy when processing large-region visual information. This paper introduces LR²Depth, a novel MDE method that addresses this challenge by utilizing large-kernel convolution on low-resolution feature maps for efficient large-region feature aggregation. Our approach leverages the fact that each pixel on low-resolution feature maps corresponds to a larger region of the original image, allowing for fast and accurate depth predictions at a lower inference cost. Extensive experiments on NYU-Depth-V2, KITTI, and SUN RGB-D datasets demonstrate that LR²Depth not only achieves state-of-the-art performance but also operates approximately twice as fast as previous MDE methods. Notably, at the time of submission, LR²Depth secured the top-1 position on the KITTI depth prediction online benchmark.

Z. Liu, Z. Cheng, and N. Yokoya, ”Neural hierarchical decomposition for single image plant modeling,” Proc. CVPR, 2025.
Project Page PDF Code Quick Abstract
Abstract: Obtaining high-quality, practically usable 3D models of biological plants remains a significant challenge in computer vision and graphics. In this paper, we present a novel method for generating realistic 3D plant models from single-view photographs. Our approach employs a neural decomposition technique to learn a lightweight hierarchical box representation from the image, effectively capturing the structures and botanical features of plants. Then, this representation can be subsequently refined through a shape-guided parametric modeling module to produce complete 3D plant models. By combining hierarchical learning and parametric modeling, our method generates structured 3D plant assets with fine geometric details. Notably, through learning the decomposition in different levels of detail, our method can adapt to two distinct plant categories: outdoor trees and houseplants, each with unique appearance features. Within the scope of plant modeling, our method is the first comprehensive solution capable of reconstructing both plant categories from single-view images.

J. Li, X. Dong, W. He, and N. Yokoya, ”Wavelength- and depth-aware deep image prior for blind hyperspectral imagery deblurring with coarse depth guidance,” Proc. WACV, 2025.
PDF Quick Abstract
Abstract: Hyperspectral imagery (HSI) provides detailed spectral information, enabling precise analysis of materials. However, HSI imaging suffers from blurring degradation which results in the loss of fine details and hinders subsequent applications. The degree of blurriness is highly related to wavelength and depth, existing deblurring methods either lack the utilization of spectral correlation or ignore the depth variation since paired HSI and depth data are difficult to acquire and less discussed, leading to degraded performance when encountering wide-range HSIs of non-planar scenes. To address these challenges in both data acquisition and algorithm design, we propose a novel approach that simultaneously collects both modalities and integrates depth refinement into a blind HSI deblurring model with wavelength- and depth-aware deep image prior. Specifically, we capture blurred HSI and coarse depth map with separate devices, followed by registration. Our method performs depth-guided deblurring through depth-variant multi-channel kernel estimation and soft-weight map-based layer composition, while simultaneously refining the depth. The proposed approach effectively restores fine details with fewer artifacts, showing superior performance for both simulated blurred HSIs and real captured HSIs.

J. Song, H. Chen, W. Xuan, J. Xia, and N. Yokoya, ”SynRS3D: A synthetic dataset for global 3D semantic understanding from monocular remote sensing imagery,” Proc. NeurIPS D&B Track (spotlight), 2024.
PDF Code Quick Abstract
Abstract: Global semantic 3D understanding from single-view high-resolution remote sensing (RS) imagery is crucial for Earth Observation (EO). However, this task faces significant challenges due to the high costs of annotations and data collection, as well as geographically restricted data availability. To address these challenges, synthetic data offer a promising solution by being easily accessible and thus enabling the provision of large and diverse datasets. We develop a specialized synthetic data generation pipeline for EO and introduce SynRS3D, the largest synthetic RS 3D dataset. SynRS3D comprises 69,667 high-resolution optical images that cover six different city styles worldwide and feature eight land cover types, precise height information, and building change masks. To further enhance its utility, we develop a novel multi-task unsupervised domain adaptation (UDA) method, RS3DAda, coupled with our synthetic dataset, which facilitates the RS-specific transition from synthetic to real scenarios for land cover mapping and height estimation tasks, ultimately enabling global monocular 3D semantic understanding based on synthetic data. Extensive experiments on various real-world datasets demonstrate the adaptability and effectiveness of our synthetic dataset and proposed RS3DAda method. SynRS3D and related codes will be available.

Z. Liu, Y. Li, F. Tu, R. Zhang, Z. Cheng, and N. Yokoya, ”DeepTreeSketch: Neural graph prediction for faithful 3D tree modeling from sketches,” Proc. CHI, 2024.
Project Page Video PDF Quick Abstract
Abstract: We present DeepTreeSketch, a novel AI-assisted sketching system that enables users to create realistic 3D tree models from 2D free- hand sketches. Our system leverages a tree graph prediction net- work, TGP-Net, to learn the underlying structural patterns of trees from a large collection of 3D tree models. The TGP-Net simulates the iterative growth of botanical trees and progressively constructs the 3D tree structures in a bottom-up manner. Furthermore, our system supports a flexible sketching mode for both precise and coarse control of the tree shapes by drawing branch strokes and foliage strokes, respectively. Combined with a procedural genera- tion strategy, users can freely control the foliage propagation with diverse and fine details. We demonstrate the expressiveness, effi- ciency, and usability of our system through various experiments and user studies. Our system offers a practical tool for 3D tree cre- ation, especially for natural scenes in games, movies, and landscape applications.

X. Dong and N. Yokoya, ”Understanding dark scenes by contrasting multi-modal observations,” Proc. WACV, 2024.
PDF Code Quick Abstract
Abstract: Understanding dark scenes based on multi-modal image data is challenging, as both the visible and auxiliary modalities provide limited semantic information for the task. Previous methods focus on fusing the two modalities but neglect the correlations among semantic classes when minimizing losses to align pixels with labels, resulting in inaccurate class predictions. To address these issues, we introduce a supervised multi-modal contrastive learning approach to increase the semantic discriminability of the learned multi-modal feature spaces by jointly performing cross-modal and intra-modal contrast under the supervision of the class correlations. The cross-modal contrast encourages same-class embeddings from across the two modalities to be closer and pushes different-class ones apart. The intra-modal contrast forces same-class or different-class embeddings within each modality to be together or apart. We validate our approach on a variety of tasks that cover diverse light conditions and image modalities. Experiments show that our approach can effectively enhance dark scene understanding based on multi-modal images with limited semantics by shaping semantic-discriminative feature spaces. Comparisons with previous methods demonstrate our state-of-the-art performance. Code and pretrained models are available at https://github.com/palmdong/SMMCL.

J. Song, H. Chen, and N. Yokoya, ”SyntheWorld: A large-scale synthetic dataset for land cover mapping and building change detection,” Proc. WACV, 2024.
PDF Data Quick Abstract
Abstract: Synthetic datasets, recognized for their cost effectiveness, play a pivotal role in advancing computer vision tasks and techniques. However, when it comes to remote sensing image processing, the creation of synthetic datasets becomes challenging due to the demand for larger-scale and more diverse 3D models. This complexity is compounded by the difficulties associated with real remote sensing datasets, including limited data acquisition and high annotation costs, which amplifies the need for high-quality synthetic alternatives. To address this, we present SyntheWorld, a synthetic dataset unparalleled in quality, diversity, and scale. It includes 40,000 images with submeter-level pixels and fine-grained land cover annotations of eight categories, and it also provides 40,000 pairs of bitemporal image pairs with building change annotations for building change detection task. We conduct experiments on multiple benchmark remote sensing datasets to verify the effectiveness of SyntheWorld and to investigate the conditions under which our synthetic data yield advantages. We will release SyntheWorld to facilitate remote sensing image processing research.

J. Xia, N. Yokoya, B. Adriano, and C. Broni-Bediako, ”OpenEarthMap: A benchmark dataset for global high-resolution land cover mapping,” Proc. WACV, 2023.
Project Page PDF Quick Abstract
Abstract: We introduce OpenEarthMap, a benchmark dataset, for global high-resolution land cover mapping. OpenEarthMap consists of 2.2 million segments of 5000 aerial and satellite images covering 97 regions from 44 countries across 6 continents, with manually annotated 8-class land cover labels at a 0.25-0.5m ground sampling distance. Semantic segmentation models trained on the OpenEarthMap generalize worldwide and can be used as off-the-shelf models in a variety of applications. We evaluate the performance of state-of-the-art methods for unsupervised domain adaptation and present challenging problem settings suitable for further technical development. We also investigate lightweight models using automated neural architecture search for limited computational resources and fast mapping. The dataset will be made publicly available.

X. Dong, N. Yokoya, L. Wang, and T. Uezato, ”Learning mutual modulation for self-supervised cross-modal super-resolution,” Proc. ECCV, 2022.
PDF Quick Abstract
Abstract: Self-supervised cross-modal super-resolution (SR) can overcome the difficulty of acquiring paired training data, but is challenging because only low-resolution (LR) source and high-resolution (HR) guide images from different modalities are available. Existing methods utilize pseudo or weak supervision in LR space and thus deliver results that are blurry or not faithful to the source modality. To address this issue, we present a mutual modulation SR (MMSR) model, which tackles the task by a mutual modulation strategy, including a source-to-guide modulation and a guide-to-source modulation. In these modulations, we develop cross-domain adaptive filters to fully exploit cross-modal spatial dependency and help induce the source to emulate the resolution of the guide and induce the guide to mimic the modality characteristics of the source. Moreover, we adopt a cycle consistency constraint to train MMSR in a fully self-supervised manner. Experiments on various tasks demonstrate the state-of-the-art performance of our MMSR.

W. He, Q. Yao, N. Yokoya, T. Uezato, H. Zhang, L. Zhang, ”Spectrum-aware and transferable architecture search for hyperspectral image restoration,” Proc. ECCV, 2022.
PDF Quick Abstract
Abstract: Convolutional neural networks have been widely developed for hyperspectral image (HSI) restoration. However, making full use of the spatial-spectral information of HSIs still remains a challenge. In this work, we disentangle the 3D convolution into lightweight 2D spatial and spectral convolutions, and build a spectrum-aware search space for HSI restoration. Subsequently, we utilize neural architecture search strategy to automatically learn the most efficient architecture with proper convolutions and connections in order to fully exploit the spatial-spectral information. We also determine that the super-net with global and local skip connections can further boost HSI restoration performance. The searched architecture on the CAVE dataset has been adopted for various dataset denoising and imaging reconstruction tasks, and achieves remarkable performance. On the basis of fruitful experiments, we conclude that the transferability of searched architecture is dependent on the spectral information and independent of the noise levels.

N. Mo, W. Gan, N. Yokoya, and S. Chen, “ES6D: A computation efficient and symmetry-aware 6D pose regression framework,” Proc. CVPR, 2022.
PDF Quick Abstract
Abstract:

T. Uezato, D. Hong, N. Yokoya, and W. He, “Guided deep decoder: Unsupervised image pair fusion,” Proc. ECCV (spotlight), August 23-28, 2020.
PDF Quick Abstract
Abstract: The fusion of input and guidance images that have a tradeoff in their information (e.g., hyperspectral and RGB image fusion or pansharpening) can be interpreted as one general problem. However, previous studies applied a task-specific handcrafted prior and did not address the problems with a unified approach. To address this limitation, in this study, we propose a guided deep decoder network as a general prior. The proposed network is composed of an encoder-decoder network that exploits multi-scale features of a guidance image and a deep decoder network that generates an output image. The two networks are connected by feature refinement units to embed the multi-scale features of the guidance image into the deep decoder network. The proposed network allows the network parameters to be optimized in an unsupervised way without training data. Our results show that the proposed network can achieve state-of-the-art performance in various image fusion problems.

W. He, Q. Yao, C. Li, N. Yokoya, and Q. Zhao, “Non-local meets global: An integrated paradigm for hyperspectral denoising,” Proc. CVPR, Long Beach, CA, US, June 16-20, 2019.
PDF Quick Abstract
Abstract: Non-local low-rank tensor approximation has been developed as a state-of-the-art method for hyperspectral image (HSI) denoising. Unfortunately, while their denoising performance benefits little from more spectral bands, the running time of these methods significantly increases. In this paper, we claim that the HSI lies in a global spectral low-rank subspace, and the spectral subspaces of each full band patch groups should lie in this global low-rank subspace. This motivates us to propose a unified spatial-spectral paradigm for HSI denoising. As the new model is hard to optimize, An efficient algorithm motivated by alternating minimization is developed. This is done by first learning a low-dimensional orthogonal basis and the related reduced image from the noisy HSI. Then, the non-local low-rank denoising and iterative regularization are developed to refine the reduced image and orthogonal basis, respectively. Finally, the experiments on synthetic and both real datasets demonstrate the superiority against the stateof-the-art HSI denoising methods.

Other peer-reviewed conferences and workshops

C. Broni-Bediako, J. Xia, and N. Yokoya, ”Unsupervised domain adaptation architecture search with self-training for land cover mapping,” Proc. EarthVision CVPRW, 2024.
PDF Quick Abstract
Abstract: Unsupervised domain adaptation (UDA) is a challenging open problem in land cover mapping. Previous studies show encouraging progress in addressing cross-domain distribution shifts on remote sensing benchmarks for land cover mapping. The existing works are mainly built on large neural network architectures, which makes them resource-hungry systems, limiting their practical impact for many real-world applications in resource-constrained environments. Thus, we proposed a simple yet effective framework to search for lightweight neural networks automatically for land cover mapping tasks under domain shifts. This is achieved by integrating Markov random field neural architecture search (MRF-NAS) into a self-training UDA framework to search for efficient and effective networks under a limited computation budget. This is the first attempt to combine NAS with self-training UDA as a single framework for land cover mapping. We also investigate two different pseudo-labelling approaches (confidence-based and energy-based) in self-training scheme. Experimental results on two recent datasets (OpenEarthMap & FLAIR #1) for remote sensing UDA demonstrate a satisfactory performance. With only less than 2M parameters and 30.16 GFLOPs, the best-discovered lightweight network reaches state-of-the-art performance on the regional target domain of OpenEarthMap (59.38% mIoU) and the considered target domain of FLAIR #1 (51.19% mIoU). The code is at https://github.com/cliffbb/UDA-NAS.

J. Song, B. Adriano, and N. Yokoya, ”Disaster detection from SAR images with different off-nadir angles using unsupervised image translation,” Proceedings of the 2nd CDCEO Workshop, IJCAI-ECAI, 2022, pp. 14-20.
PDF Quick Abstract
Abstract: Synthetic aperture radar (SAR) images observed at different off-nadir angles have different intensities, and change detection methods using difference images do not work well. This problem hinders emergency response when there is no archive data with a consistent off-nadir angle as emergency SAR observation. In this paper, we investigate unsupervised image translation methods based on generative adversarial networks and autoencoders to detect flood and landslide areas using SAR images observed at different off-nadir angles. Comprehensive experiments of disaster detection using ALOS-2 PALSAR-2 images for three floods and two landslides show that the developed methods can significantly improve the accuracy of disaster detection using pre- and post-disaster images observed at different off-nadir angles.

B. Adriano, N. Yokoya, K. Yamanoi, and S. Oishi, ”Predicting flood inundation depth based-on machine learning and numerical simulation,” Proceedings of the 2nd CDCEO Workshop, IJCAI-ECAI, 2022, pp. 58-64.
PDF Quick Abstract
Abstract: Recent advances in earth observation and machine learning have enabled rapid estimation of flooded areas following catastrophic events such as torrential rains and riverbank overflows. However, estimating the actual inundation depth remains a challenge since it often requires detailed numerical simulation. This paper presents a methodology for predicting the inundation from remote sensing derived information by coupling deep learning and numerical simulation. We generate a large dataset of flood depth inundations considering several heavy rain conditions in four independent target areas. We propose a CNN-based regression framework. Our experiment demonstrates that our methodology can predict inundation depth on a separate target area not included during training, demonstrating great generalization ability.

J. Xia, N. Yokoya, and B. Adriano, “Building damage mapping with self-positive unlabeled learning,” NeurIPS Workshop, 2021.
PDF Quick Abstract
Abstract: Humanitarian organizations must have fast and reliable data to respond to disasters. Deep learning approaches are difficult to implement in real-world disasters because it might be challenging to collect ground truth data of the damage situation (training data) soon after the event. The implementation of recent self-paced positive-unlabeled learning (PU) is demonstrated in this work by successfully applying to building damage assessment with very limited labeled data and a large amount of unlabeled data. The self-PU learning is compared with the supervised baselines and traditional PU learning using different datasets collected from the 2011 Tohoku earthquake, the 2018 Palu tsunami, and the 2018 Hurricane Michael. By utilizing only a portion of labeled damaged samples, we show how models trained with self-PU techniques may achieve comparable performance as supervised learning.

W. He, L. Yuan, and N. Yokoya, “Total-variation-regularized tensor ring completion for remote sensing image reconstruction,” Proc. ICASSP, Brighton, UK, May 12-17, 2019.
PDF Quick Abstract
Abstract: In recent studies, tensor ring (TR) decomposition has shown to be effective in data compression and representation. However, the existing TR-based completion methods only exploit the global low-rank property of the visual data. When applying them to remote sensing (RS) image processing, the spatial information in the RS image is ignored. In this paper, we introduce the TR decomposition to RS image processing and propose a tensor completion method for RS image reconstruction. We incorporate the total-variation regularization into the TR completion model to exploit the low-rank property and spatial continuity of the RS image simultaneously. The proposed algorithm is solved by the augmented Lagrange multiplier method and has shown the superior performance in hyperspectral image reconstruction and multi-temporal RS image cloud removal against the state-of-the-art algorithms.

D. Hong, N. Yokoya, J. Xu, and X. X. Zhu, “Joint & progressive learning from high-dimensional data for multi-label classification,” Proc. ECCV, Munich, Germany, September 8-14, 2018.
PDF Quick Abstract
Abstract: Despite the fact that nonlinear subspace learning techniques (e.g. manifold learning) have successfully applied to data representation, there is still room for improvement in explainability (explicit mapping), generalization (out-of-samples), and cost-effectiveness (linearization). To this end, a novel linearized subspace learning technique is developed in a joint and progressive way, called joint and progressive learning strategy (J-Play), with its application to multi-label classification. The J-Play learns high-level and semantically meaningful feature representation from high-dimensional data by 1) jointly performing multiple subspace learning and classification to find a latent subspace where samples are expected to be better classified; 2) progressively learning multi-coupled projections to linearly approach the optimal mapping bridging the original space with the most discriminative subspace; 3) locally embedding manifold structure in each learnable latent subspace. Extensive experiments are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous state-of-the-art methods.

J. Xia, N. Yokoya, and A. Iwasaki, “Boosting for domain adaptation extreme learning machines for hyperspectral image classification,” Proc. IGARSS, Valencia, Spain, July 22-27, 2018.
PDF Quick Abstract
Abstract: Domain adaptation and transfer learning adapt the priori information of source domain to train a classier used to predict the label in the target domain. The parameter and instance transfer methods have shown excellent performance. The former adjusts the parameters of transitional classifiers and the latter re-weights the training sample to the different training set, which is similar to the AdaBoost. To further improve the performance, we proposed to combine the two techniques mentioned above. More specifically, we select the Transfer Boosting and domain adaptation extreme learning machine (DAELM) as the instance and parameter transfer methods, respectively. We refer the proposed method to the boosting for DAELM (BDAELM). We compare the proposed method with DAELM and other methods on the real cross-domain hyperspectral remote sensing images acquired over a Japanese mixed forest, showing improved classification accuracies.

V. Ferraris, N. Yokoya, N. Dobigeon, and M. Chabert, “A comparative study of fusion-based change detection methods for multi-band images with different spectral and spatial resolutions,” Proc. IGARSS, Valencia, Spain, July 22-27, 2018.
PDF Quick Abstract
Abstract: This paper deals with a fusion-based change detection (CD) framework for multi-band images with different spatial and spectral resolutions. The first step of the considered CD framework consists in fusing the two observed images. The resulting fused image is subsequently spatially or spectrally degraded to produce two pseudoobserved images, with the same resolutions as the two observed images. Finally, CD can be performed through a pixel-wise comparison of the pseudo-observed and observed images since they share the same resolutions. Obviously, fusion is a key step in this framework. Thus, this paper proposes to quantitatively and qualitatively compare state-of-the-art fusion methods, gathered into four main families, namely component substitution, multi-resolution analysis, unmixing and Bayesian, with respect to the performance of the whole CD framework evaluated on simulated and real images.

D. Hong, N. Yokoya, J. Chanussot, and X. X. Zhu, “Learning A low-coherence dictionary to address spectral variability for hyperspectral unmixing,” Proc. ICIP, Beijing, China, September 17-20, 2017.
PDF Quick Abstract
Abstract: This paper presents a novel spectral mixture model to address spectral variability in inverse problems of hyperspectral unmixing. Based on the linear mixture model (LMM), our model introduces a spectral variability dictionary to account for any residuals that cannot be explained by the LMM. Atoms in the dictionary are assumed to be low-coherent with spectral signatures of endmembers. A dictionary learning technique is proposed to learn the spectral variability dictionary while solving unmixing problems simultaneously. Experimental results on synthetic and real datasets demonstrate that the performance of the proposed method is superior to state-of-the-art methods.

N. Yokoya, P. Ghamisi, and J. Xia, “Multimodal, multitemporal, and multisource global data fusion for local climate zones classification based on ensemble learning,” Proc. IGARSS, Texas, USA, July 23-28, 2017.
PDF Quick Abstract
Abstract: This paper presents a new methodology for classification of local climate zones based on ensemble learning techniques. Landsat-8 data and open street map data are used to extract spectral-spatial features, including spectral reflectance, spectral indexes, and morphological profiles fed to subsequent classification methods as inputs. Canonical correlation forests and rotation forests are used for the classification step. The final classification map is generated by majority voting on different classification maps obtained by the two classifiers using multiple training subsets. The proposed method achieved an overall accuracy of 74.94% and a kappa coefficient of 0.71 in the 2017 IEEE GRSS Data Fusion Contest.

J. Xia, N. Yokoya, and A. Iwasaki, “Ensemble of transfer component analysis for domain adaptation in hyperspectral remote sensing image classification,” Proc. IGARSS, Texas, USA, July 23-28, 2017.
PDF Quick Abstract
Abstract: In this work, we address the problem of unsupervised domain transfer learning via an ensemble strategy in the context of classification between multiple hyperspectral images. The objective of domain adaption is to assign the label to an image of interest (the target image) using the labeled samples in the source image. The proposed method is based on the rotation-based ensemble and transfer component analysis (TCA). In this method, the feature space in both source and target image is divided into several disjoint feature subsets. Then, the features induced by the TCA technique in the source domain are used as the input space to a random forest (RF) classifier. Finally, the results achieved by each step are fused by a majority vote. We compare the proposed method, ensemble of TCA (E-TCA), with a regular RF and an RF with the reduced features by the TCA. Experiments on the real hyperspectral image acquired over a Japanese mixed forest show remarkable cross-image classification performances.

J. Xia, N. Yokoya, and A. Iwasaki, “Hyperspectral image classification with partial least square forest,” Proc. IGARSS, Texas, USA, July 23-28, 2017.
PDF Quick Abstract
Abstract: In the hyperspectral remote sensing community, decision forests combine the predictions of multiple decision trees (DTs) to achieve better prediction performance. Two well-known and powerful decision forests are Random Forest (RF) and Rotation Forest (RoF). In this work, a novel decision forest, called \textit{Partial Least Square Forest} (PLSF), is proposed. In the PLSF, we adapt PLS to obtain the components for the hyperplane splitting. Moreover, the projection bootstrap technique is used to retain the full spectral bands for the selection of split in the projected space. Experimental results on three hyperspectral datasets indicated the effectiveness of the proposed PLSF because it enhances the diversity and accuracy within the ensemble when compared to RF and RoF.

J. Xia, N. Yokoya, and A. Iwasaki, “Tree species classification in Japanese mixed forest with hyperspectral and LiDAR data using rotation forest algorithm,” Proc. EARSeL IS, Zurich, Switzerland, April 19-21, 2017.
PDF Quick Abstract
Abstract:

J. Xia, N. Yokoya, and A. Iwasaki, “A novel ensemble classifier of hyperspectral and LiDAR data using morphological features,” Proc. ICASSP, New Orleans, US, March 5-9, 2017.
PDF Quick Abstract
Abstract:

J. Xia, N. Yokoya, and A. Iwasaki, “Mapping of large size hyperspectral imagery using fast machine learning algorithms,” Proc. ACRS, Colombo, Sri Lanka, October 17-21, 2016.
PDF Quick Abstract
Abstract:

N. Yokoya, X. X. Zhu, and A. Plaza, “Graph-regularized coupled spectral unmixing for multisensor time-series analysis,” Proc. WHISPERS, LA, US, August 21-24, 2016.
PDF Quick Abstract
Abstract: A new methodology that solves unmixing problems involving a set of multisensor time-series spectral images is proposed in order to understand dynamic changes of the surface at a subpixel scale. The proposed methodology couples multiple unmixing problems via regularization on graphs between the multisensor time-series data to obtain robust and stable unmixing solutions beyond data modalities owing to different sensor characteristics and the effects of non-optimal atmospheric correction. A synthetic dataset that includes seasonal and trend changes on the surface and the residuals of non-optimal atmospheric correction is used for numerical validation. Experimental results demonstrate the effectiveness of the proposed methodology.

N. Yokoya and P. Ghamisi, “Land-cover monitoring using time-series hyperspectral data via fractional-order Darwinian particle swarm optimization segmentation,” Proc. WHISPERS, LA, US, August 21-24, 2016.
PDF Quick Abstract
Abstract: This paper presents a new method for unsupervised detection of multiple changes using time-serires hyperspectral data. The proposed method is based on fractional-order Darwinian particle swarm optimization (FODPSO) segmentation. The proposed method is applied to monitor land-cover changes following the Fukushima Daiichi nuclear disaster using multitemporal Hyperion images. Experimental results indicate that the integration of segmentation and a time-series of hyperspectral images has great potential for unsupervised detection of multiple changes.

J. C.-W. Chan and N. Yokoya, “Mapping land covers of Brussels capital region using spatially enhanced hyperspectral images,” Proc. WHISPERS, LA, US, August 21-24, 2016.
Quick Abstract
Abstract:

D. Hong, N. Yokoya, and X. X. Zhu, “The K-LLE algorithm for nonlinear dimensionality reduction of large-scale hyperspectral data,” Proc. WHISPERS, LA, US, August 21-24, 2016.
Quick Abstract
Abstract:

D. Hong, N. Yokoya, and X. X. Zhu, “Local manifold learning with robust neighbors selection for hyperspectral dimensionality reduction,” Proc. IGARSS, Beijing, China, July 10-15, 2016.
Quick Abstract
Abstract:

N. Yokoya and A. Iwasaki, “Generalized-Hough-transform object detection using class-specific sparse representation for local-feature detection,” Proc. IGARSS, Milan, Italy, July 26-31, 2015.
Quick Abstract
Abstract:

D. Niina, N. Yokoya, and A. Iwasaki, “Detector anomaly detection and stripe correction of hyperspectral data,” Proc. IGARSS, Milan, Italy, July 26-31, 2015.
Quick Abstract
Abstract:

L. Loncan, L. B. Almeida, J. Bioucas Dias, X. Briottet, J. Chanussot, N. Dobigeon, S. Fabre, W. Liao, G. A. Licciardi, M. Simoes, J. Y. Tourneret, M. A. Veganzones, G. Vivone, Q. Wei, and N. Yokoya, “Comparison of nine hyperspectral pansharpening methods,” Proc. IGARSS, Milan, Italy, July 26-31, 2015.
Quick Abstract
Abstract:

N. Yokoya and X. X. Zhu, “Graph regularized coupled spectral unmixing for change detection,” Proc. WHISPERS, Tokyo, Japan, June 2-5, 2015.
PDF Quick Abstract
Abstract: This paper presents a methodology of coupled spectral unmixing for multitemporal hyperspectral data analysis. Coupled spectral unmixing simultaneously extracts the sets of spectral signatures of endmembers and respective abundance maps from multiple spectral images with differences in observation conditions and sensor characteristics. The problem is formulated in the framework of coupled nonnegative matrix factorization. A graph regularization that reflects spectral correlation between two images on abundance fractions is introduced into the optimization of coupled spectral unmixing to consider temporal changes of the earth’s surface. An alternating optimization algorithm is investigated using the method of Lagrange multipliers to guarantee a stable convergence. The proposed method was applied to dual-temporal Hyperion images taken over the Fukushima Daiichi nuclear power plant. Experimental results showed that the proposed method can extract essential information on the earth’s surface in a data-driven manner beyond multitemporal data modality

T. Takayama, N. Yokoya, and A. Iwasaki, “Optimal hyperspectral classification for paddy field with semisupervised self-learning,” Proc. WHISPERS, Tokyo, Japan, June 2-5, 2015.
PDF Quick Abstract
Abstract: Monitoring and management of paddy fields are one of key elements for not only stable production but also ensuring national food security. Classification of growth stage with remote sensing data is expected to be a highly effective solution, which can capture large area in one time observation. In general cases, a pixel-based classification is one of the most attractive choices. However, acquiring enough number of field survey plots for the classification is not easy from the aspect of consumed time and cost. This problem can impact negatively on the accuracy of classification map. In this paper, we propose semisupervised classification method considering characteristic of paddy field in order to provide an optimal classification map with hyperspectral data.

N. Yokoya, M. Kokawa, and J. Sugiyama, “Spectral unmixing of fluorescence fingerprint imagery for visualization of constituents in pie pastry,” Proc. ICIP, Paris, France, October 27-30, 2014.
PDF Quick Abstract
Abstract: In this work, we present a new method that combines fluorescence fingerprint (FF) imaging and spectral unmixing to visualize microstructures in food. The method is applied to visualization of three constituents, gluten, starch, and butter, in two types of pie pastry. It is challenging to discriminate between starch and butter because both of them can be represented by similar FFs of low intensities. Two optimization approaches of FF unmixing that consider qualitative knowledge are presented and validated by comparison to the conventional staining method. Although starch and butter were represented by very similar FFs, a constrained-least-squares method with abundance quantization successfully visualized the distributions of constituents in pie pastry.

C. F. Liew, N. Yokoya, and T. Yairi, “Facial alignment by using sparse initialization and random forest,” Proc. ICIP, Paris, France, October 27-30, 2014.
Quick Abstract
Abstract:

N. Yokoya and A. Iwasaki, “Object localization based on sparse representation for remote sensing imagery,” Proc. IGARSS, Québec, Canada, July 13-18, 2014.
PDF Quick Abstract
Abstract: In this paper, we propose a new object localization method named sparse representation based object localization (SROL), which is based on the generalized Hough-transform-based approach using sparse representations for parts detection. The proposed method was applied to car and ship detection in remote sensing images and its performance was compared to those of state-of-the-art methods. Experimental results showed that the SROL algorithm can accurately localize categorical objects or a specific object using a small size of training data.

N. Yokoya and A. Iwasaki, “Airborne unmixing-based hyperspectral super-resolution using RGB imagery,” Proc. IGARSS, Québec, Canada, July 13-18, 2014.
PDF Quick Abstract
Abstract: This paper presents an airborne experiment on unmixingbased hyperspectral super-resolution using RGB imagery. Preprocessing is described to ensure spatial and spectral consistency between hyperspectral and RGB images. An extended version of coupled nonnegative matrix factorization (CNMF) is introduced for multisensor hyperspectral super-resolution to deal with a challenging problem setting, i.e., only three spectral channels for higher spatial information and a 10-fold difference of ground sampling distance. The proposed method successfully estimated the high-spatial-resolution red-edge image. Numerical evaluation by comparing the high-spatial-resolution hyperspectral image to ground-measured spectra demonstrated recovery of pure-pixel spectra by the proposed method.

N. Yokoya and A. Iwasaki, “Effect of unmixing-based hyperspectral super-resolution on target detection,” Proc. WHISPERS, Lausanne, Switzerland, June 24-27, 2014.
PDF Quick Abstract
Abstract: We present an airborne experiment on unmixing-based hyperspectral super-resolution using RGB imagery to examine the restoration of pure spectra comparing with ground-measured spectra and demonstrate its impact on target detection. An extended version of coupled nonnegative matrix factorization (CNMF) is used for hyperspectral super-resolution to deal with a challenging problem setting. Our experiment showed that the extended CNMF can restore pure spectra, which contribute to accurate target detection.

T. Matsuki, N. Yokoya, and A. Iwasaki, “Hyperspectral tree species classification with an aid of LiDAR data,” Proc. WHISPERS, Lausanne, Switzerland, June 24-27, 2014.
PDF Quick Abstract
Abstract: Classification of tree species is one of the most important applications in remote sensing. A methodology to classify tree species using hyperspectral and LiDAR data is proposed. The data processing consists of shadow correction, individual tree crown delineation, classification by support vector machine (SVM) and postprocessing by a smoothing filter. The authors applied this procedure to the data taken over Tama Forest Science Garden in Tokyo, Japan and classified it into 16 classes of tree species. As a result, the authors achieved classification accuracy of 79 with 10 % training data, which is 17 % higher than what is obtained by using hyperspectral data only. Shadow correction and morphological processing derived from LiDAR data increase the accuracy by 3 % and 14 %, respectively.

T. Matsuki, N. Yokoya, and A. Iwasaki, “Fusion of hyperspectral and LiDAR data for tree species classification,” Proc. 34th ACRS, Bali, Indonesia, Oct. 20-24, 2013.
PDF Quick Abstract
Abstract: Tree species classification is one of the most important applications in remote sensing. In this study, the authors propose a methodology to classify tree species using hyperspectral and LiDAR data. The method consists of shadow correction, individual tree crown delineation and classification by support vector machine (SVM). Shadows in hyperspectral data are modified by unmixing. Individual tree crown delineation is achieved by a local maxima and region growing method for a LiDAR derived canopy height model (CHM). The input variables of SVM classifiers are principal components of hyperspectral data and the canopy form (height and size). The authors applied this method to the hyperspectral and LiDAR dataset taken over Tama Forest Science Garden in Tokyo and classified the data into 19 classes. As a result, we achived classification accuracy of 68 %, which is 20 % higher than what is obtained by using hyperspectral data only.

N. Yokoya and A. Iwasaki, “Hyperspectral and multispectral data fusion mission on hyperspectral imager suite (HISUI),” Proc. IGARSS, Melbourne, Australia, Jul. 21-26, 2013.
PDF Quick Abstract
Abstract: Hyperspectral imager suite (HISUI) is the Japanese nextgeneration earth-observing sensor composed of hyperspectral and multispectral imagers. Unmixing-based fusion of hyperspectral and multispectral data enables the production of high-spatial-resolution hyperspectral data. HISUI simulated imaging system combining two imagers was developed for verification experiments to investigate the feasibility and clarify the whole procedure of the hyperspectral and multispectral data fusion mission on HISUI. Airborne experiments are planned as simulation tests of HISUI higher-order products. The experimental results of the ground based observation showed the importance of the preprocessing and cross-calibration on the final quality of fused data, which contributes to the practical use of hyperspectral and multispectral data fusion.

N. Yokoya and A. Iwasaki, “Design of combined optical imagers using unmixing-based hyperspectral data fusion,” Proc. WHISPERS, Florida, USA, Jun. 25-28, 2013.
PDF Quick Abstract
Abstract: Unmixing-based hyperspectral and multispectral data fusion enables the production of high-spatial-resolution and hyperspectral imagery with small spectral errors. In this work, we present sensor design of combined optical imagers using unmixing-based data fusion, which aims to fuse hyperspectral and multispectral sensors and improve the performance of the final fused data. Owing to the degeneracy of the data cloud and additive noise, there is an optimal range in the relationship of spatial resolutions between two imagers.

N. Yokoya and A. Iwasaki, “Optimal design of hyperspectral imager suite (HISUI) for hyperspectral and multispectral data fusion,” Proc. ISRS, Tokyo, Japan, May. 15-17, 2013.
Quick Abstract
Abstract: The spatial resolution of hyperspectral sensors is lower than that of multispectral and panchromatic imagers to maintain better signal-to-noise ratios (SNRs). Hyperspectral imager suite (HISUI) is a Japanese nextgeneration earth-observing imager consisting of hyperspectral and multispectral cameras that have 30 and 5 m ground sampling distances (GSDs), respectively. Unmixing-based hyperspectral and multispectral data fusion enables the production of high-spatial-resolution hyperspectral imagery with small spectral errors by alternately unmixing two data to obtain endmember spectra and high-spatial-resolution abundance maps. In this work, we present a sensor design of combined optical imagers using unmixing-based data fusion, which aims to fuse hyperspectral and multispectral sensors and improve the performance of the final fused data. Assuming HISUI/VNIR specifications, such as spectral bandwidth and SNR, we investigate the optimal relationship of spatial resolution from the perspective of data fusion. The final performance of fused data is determined by the accuracies of two unmixings. Owing to the degeneracy of the data cloud, there is an optimal range in the relationship of spatial resolutions between two imagers. When the spatial resolution of the multispectral imager is fixed at 5 m, 20−30 m is the optimal range for the spatial resolution of the hyperspectral imager, which is the actual design point of HISUI. Hyperspectral and multispectral data fusion can usher in a new concept for satellite sensor design that aims to obtain high-spatial-resolution and high-spectral-resolution data by observing spectral information using a hyperspectral camera and spatial information provided by a multispectral camera. HISUI is a promising sensor that enables hyperspectral and multispectral data fusion and the production of high-spatial-resolution hyperspectral data, which will bring a major breakthrough in hyperspectral remote sensing applications.

N. Yokoya, J. Chanussot, and A. Iwasaki, “Generalized bilinear model based nonlinear unmixing using semi-nonnegative matrix factorization,” Proc. IGARSS, Munich, Germany, Jul. 22-27, 2012.
PDF Quick Abstract
Abstract: Nonlinear spectral mixing models have been recently receiving attention in hyperspectral image processing. This paper presents a novel optimization method for nonlinear unmixing based on a generalized bilinear model (GBM), which considers second-order scattering effects. Semi-nonnegative matrix factorization is used for the optimization to process a whole image in a matrix form. The proposed method is applied to an airborne hyperspectral image with many endmembers and shows good performance both in unmixing quality and computational cost with simple implementation. The effect of endmember extraction on nonlinear unmixing is investigated and the impacts of the nonlinearity on abundance maps are demonstrated.

A. Iwasaki, N. Yokoya, T. Arai, Y. Itoh, and N. Miyamura, “Similarity measure for spatial-spectral registration in hyperspectral era,” Proc. IGARSS, Munich, Germany, Jul. 22-27, 2012.
Quick Abstract
Abstract: In the hyperspectral era, the demand for data registration in spectral region is an important issue in addition to spatial region. Detection of smile and keystone phenomena that are caused by aberrations in spectrometer is related to registration activity, which is crucial for data fusion research. Hyperspectral Imager Suite (HISUI) is a next-generation Japanese optical sensor that is composed of a hyperspectral imager and a multispectral imager, which will be launched on Advanced Land Observation Satellite 3 (ALOS-3). Three similarity measures, normalized cross correlation (NCC), phase correlation (PC) and mutual information (MI), for spatial-spectral registration of hyperspectral data are discussed for Level-1 data processing of HISUI.

N. Yokoya, J. Chanussot, and A. Iwasaki, “Hyperspectral and multispectral data fusion based on nonlinear unmixing,” Proc. WHISPERS, Shanghai, China, Jun. 4-7, 2012.
PDF Quick Abstract
Abstract: Data fusion of low spatial-resolution hyperspectral (HS) and high spatial-resolution multispectral (MS) images based on a linear mixing model (LMM) enables the production of high spatial-resolution HS data with small spectral distortion. This paper extends the LMM based HS-MS data fusion to nonlinear mixing model using a bilinear mixing model (BMM), which considers second scattering of photons between two distinct materials. A generalized bilinear model (GBM) is able to deal with the underlying assumptions in the BMM. The GBM is applied to HS-MS data fusion to produce high-quality fused data regarding multiple scattering effect. Semi-nonnegative matrix factorization (Semi-NMF), which can be easily incorporated with the existing LMM based fusion method, is introduced as a new optimization method for the GBM unmixing. Comparing with the LMM based HS-MS data fusion, the proposed method showed better results on synthetic datasets.

N. Yokoya, T. Yairi, and A. Iwasaki, “Coupled non-negative matrix factorization for hyperspectral and multispectral data fusion: application for pasture classification,” Proc. IGARSS, Vancuber, Canada, Jul. 24-29, 2011.
PDF Quick Abstract
Abstract: Coupled non-negative matrix factorization (CNMF) is introduced for hyperspectral and multispectral data fusion. The CNMF fused data have little spectral distortion while enhancing spatial resolution of all hyperspectral band images owing to its unmixing based algorithm. CNMF is applied to the synthetic dataset generated from real airborne hyperspectral data taken over pasture area. The spectral quality of fused data is evaluated by the classification accuracy of pasture types. The experiment result shows that CNMF enables accurate identification and classification of observed materials at fine spatial resolution.

N. Yokoya, T. Yairi, and A. Iwasaki, “Hyperspectral, multispectral, and panchromatic data fusion based on non-negative matrix factorization,” Proc. WHISPERS, Lisbon, Portugal, Jun. 6-9, 2011.
PDF Quick Abstract
Abstract: Coupled non-negative matrix factorization (CNMF) is applied to hyperspectral, multispectral, and panchromatic data fusion. This unmixing based method extracts and fuses hyperspectral endmember spectra and high-spatial-resolution abundance maps using these three data. An experiment with the synthetic data simulating ALOS-3 (advanced land observing satellite 3) dataset shows that the CNMF method has a possibility to produce fused data which have both high spatial and spectral resolutions with smaller spectral distortion.

N. Yokoya, and A. Iwasaki, “A maximum noise fraction transform based on a sensor noise model for hyperspectral data,” Proc. 31st ACRS, Hanoi, Vietnam, Nov. 1-5, 2010.
PDF Quick Abstract
Abstract: The maximum noise fraction (MNF) transform, which produces the improved order of components by signal to noise ratio (SNR), has been commonly used for spectral feature extraction from hyperspectral remote sensing data before image classification. When hyperspectral data contains a spectral distortion, also known as a “smile” property, the first component of the MNF, which should have high image quality, suffers from noisy brightness gradient pattern which thus reduces classification accuracy. This is probably because the classic noise estimation of the MNF is different from the real noise model. The noise estimation is the most important procedure because the noise covariance matrix determines the characteristics of the MNF transform. An improved noise estimation method from a single image based on a noise model of a charge coupled device (CCD) sensor is introduced to enhance the feature extraction performance of the MNF. This method is applied to both airborne and spaceborne hyperspectral data, acquired from the airborne visible infrared/imaging spectrometer (AVIRIS) and the EO-1/Hyperion, respectively. The experiment for the Hyperion data demonstrates that the proposed MNF is resistant to the spectral distortion of hyperspectral data. Furthermore, the image classification experiment for the AVIRIS Indian pines data using the MNF as a preprocessing step to extract spectral features shows that the proposed method extracts higher SNR components in lower MNF components than the existing feature extraction methods.

N. Yokoya, N. Miyamura, and A. Iwasaki, “Preprocessing of hyperspectral imagery with consideration of smile and keystone properties,” Proc. SPIE 7857-10, Incheon, Korea, Oct. 11-15, 2010.
PDF Quick Abstract
Abstract: Satellite hyperspectral imaging sensors suffer from ‘’smile’’ and ‘’keystone’’ properties, which appear as distortions of spectrum image. The smile property is the center wavelength shift and the keystone property is the band-to-band misregistration. These distortions degrade the spectrum information and reduce classification accuracies. Furthermore, these properties may change after the launch. Therefore, in the preprocessing of satellite hyperspectral image, the onboard correction of the smile and keystone properties only from the observed images is important issue as well as the radiometric and geometric correction. The main objective of this work is to build up the prototype of the preprocessing of hyperspectral image with consideration of the smile and keystone properties. The image registration based on phase correlation is proposed to detect the smile and keystone properties. By estimating the distortion of the atmospheric absorption line, the smile property is detected, and by estimating band-to-band misregistration, the keystone property is detected. Cubic spline interpolation is adopted to modify the spectrum because of its good trade-off between the smoothness and shape preservation. The smile and keystone correction is built into the preprocessing of the radiometric and geometric correction. The Hyperion visible near-infrared (VNIR) is used as a simulation data. It is proved that the smile and keystone distortions are modified on the analysis of maximum noise fraction (MNF) transformation. The precise detection and correction of the smile and keystone properties make it possible to maximize the spectral performance of the hyperspectral imagery. The proposed method is the prototype of the preprocessing of the future satellite hyperspectral sensors.

N. Yokoya, N. Miyamura, and A. Iwasaki, “Detection and correction of spectral and spatial misregistration for hyperspectral data,” Proc. IGARSS, Honolulu, HI, Jul. 25-30, 2010.
PDF Quick Abstract
Abstract: Hyperspectral imaging sensors suffer from spectral and spatial misregistrations. These artifacts prevent the accurate acquisition of the spectra and thus reduce classification accuracy. The main objective of this work is to detect and correct spectral and spatial misregistrations of hyperspectral images. The Hyperion visible near-infrared (VNIR) subsystem is used as an example. An image registration method using normalized cross-correlation for characteristic lines in spectrum image demonstrates its effectiveness for detection of the spectral and spatial misregistrations. Cubic spline interpolation using estimated properties makes it possible to modify the spectral signatures. The accuracy of the proposed postlaunch estimation of the Hyperion properties has been proven to be comparable to that of the prelaunch measurements, which enables the precise onboard calibration of hyperspectral sensors.

A. Iwasaki, M. Koga, H. Kanno, N. Yokoya, T. Okuda, and K. Saito, “Challenge of ASTER digital elevation model,” Proc. IGARSS, Honolulu, HI, Jul. 25-30, 2010.
Quick Abstract
Abstract: Accuracy of digital elevation model (DEM) obtained by the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) that has along-track stereovision is investigated. The pointing offset and stability of the radiometer is one cause of the geometric deviation of the ASTER DEM attached with orthorectified image. The correction methodology to be implemented to the data processing is suggested. A fine-tuning of image matching procedure leads to better reproduction of the topography. The comparison with reference DEM is described.

Technical report

N. Yokoya and A. Iwasaki, ”Airborne hyperspectral data over Chikusei,” Space Appl. Lab., Univ. Tokyo, Japan, Tech. Rep. SAL-2016-05-27, May 2016.
PDF Quick Abstract
Abstract: Airborne hyperspectral datasets were acquired by Hyperspec-VNIR-C (Headwall Photonics Inc.) over agricultural and urban areas in Chikusei, Ibaraki, Japan, on July 29, 2014, as one of the flight campaigns supported by KAKENHI 24360347. This technical report summarizes the experiment. The hyperspectral data and ground truth were made available to the scientific community.
Quick View ENVI (1.0 GB) MATLAB (1.4 GB) Readme
The airborne hyperspectral dataset was taken by Headwall Hyperspec-VNIR-C imaging sensor over agricultural and urban areas in Chikusei, Ibaraki, Japan, on July 29, 2014 between the times 9:56 to 10:53 UTC+9. The central point of the scene is located at coordinates: 36.294946N, 140.008380E. The hyperspectral dataset has 128 bands in the spectral range from 363 nm to 1018 nm. The scene consists of 2517x2335 pixels and the ground sampling distance was 2.5 m. Ground truth of 19 classes was collected via a field survey and visual inspection using high-resolution color images obtained by Canon EOS 5D Mark II together with the hyperspectral data. The hyperspectral data and ground truth were made available to the scientific community in the ENVI and MATLAB formats at http://park.itc.u-tokyo.ac.jp/sal/hyperdata. More details of the experiment are presented in the technical report given below.

In order to use the datasets, please fulfill the following three requirements:

1) Giving an acknowledgement as follows:

The authors gratefully acknowledge Space Application Laboratory, Department of Advanced Interdisciplinary Studies, the University of Tokyo for providing the hyperspectral data.

2) Using the following license for hyperspectral data:

http://creativecommons.org/licenses/by/3.0/

3) This dataset was made public by Dr. Naoto Yokoya and Prof. Akira Iwasaki from the University of Tokyo. Please cite:

In WORD:

N. Yokoya and A. Iwasaki, "Airborne hyperspectral data over Chikusei," Space Appl. Lab., Univ. Tokyo, Japan, Tech. Rep. SAL-2016-05-27, May 2016.

In LaTex:

@techreport{NYokoya2016,
author = {N. Yokoya and A. Iwasaki},
title = {Airborne hyperspectral data over Chikusei},
institution = {Space Application Laboratory, University of Tokyo},
number = {SAL-2016-05-27},
address = {Japan},
month = {May},
year = 2016,
}

Yokoya

Naoto

Naoto Yokoya

Yokoya Naoto

横矢

横矢直人

University of Tokyo

東京大学

hyperspectral

ハイパースペクトル

remote sensing

リモートセンシング

pattern recognition

パターン認識

data fusion

データ融合

Journal publications

Leading conferences

Other peer-reviewed conferences and workshops

Technical report