Yokoya

Naoto

Naoto Yokoya

Yokoya Naoto

横矢

横矢直人

University of Tokyo

東京大学

hyperspectral

ハイパースペクトル

remote sensing

リモートセンシング

pattern recognition

パターン認識

data fusion

データ融合

My research interests include image processing for computational imaging and image analysis based on computer vision and machine learning. In particular, my research is focused on intelligent information processing to automatically extract map information, such as land cover labels and elevation models, from remote sensing images acquired by spaceborne and airborne sensors. My work is motivated by applications in Earth observation in order to respond to global challenges through disaster management and environmental assessment for decision making.

Image Processing for Computational Imaging

Computational imaging, which integrates sensing and computation, allows us to acquire information that cannot be obtained by hardware alone and to overcome hardware limitations, such as resolution and noise. Based on machine learning, optimization, and signal processing, we build mathematical models and develop algorithms to recover unknown original signals from incomplete observation data.

Related publications

G. Baier, A. Deschemps, M. Schmitt, and N. Yokoya, ”Synthesizing optical and SAR imagery from land cover maps and auxiliary raster data,” IEEE Transactions on Geoscience and Remote Sensing, (in press), 2021.
PDF    Code    Quick Abstract

Abstract: We synthesize both optical RGB and SAR remote sensing images from land cover maps and auxiliary raster data using GANs. In remote sensing many types of data, such as digital elevation models or precipitation maps, are often not reflected in land cover maps but still influence image content or structure. Including such data in the synthesis process increases the quality of the generated images and exerts more control on their characteristics. Spatially adaptive normalization layers fuse both inputs and are applied to a full-blown generator architecture consisting of encoder and decoder, to take full advantage of the information content in the auxiliary raster data. Our method successfully synthesizes medium (10m) and high (1m) resolution images, when trained with the corresponding dataset. We show the advantage of data fusion of land cover maps and auxiliary information using mean intersection over union, pixel accuracy and Fréchet inception distance using pre-trained U-Net segmentation models. Handpicked images exemplify how fusing information avoids ambiguities in the synthesized images. By slightly editing the input our method can be used to synthesize realistic changes, i.e., raising the water levels. The source code is available at this https URL and we published the newly created high-resolution dataset at this https URL.

T. Uezato, D. Hong, N. Yokoya, and W. He, “Guided deep decoder: Unsupervised image pair fusion,” Proc. ECCV (spotlight), 2020.
PDF    Supmat    Code    Quick Abstract

Abstract: The fusion of input and guidance images that have a tradeoff in their information (e.g., hyperspectral and RGB image fusion or pansharpening) can be interpreted as one general problem. However, previous studies applied a task-specific handcrafted prior and did not address the problems with a unified approach. To address this limitation, in this study, we propose a guided deep decoder network as a general prior. The proposed network is composed of an encoder-decoder network that exploits multi-scale features of a guidance image and a deep decoder network that generates an output image. The two networks are connected by feature refinement units to embed the multi-scale features of the guidance image into the deep decoder network. The proposed network allows the network parameters to be optimized in an unsupervised way without training data. Our results show that the proposed network can achieve state-of-the-art performance in various image fusion problems.

W. He, N. Yokoya, L. Yuan, and Q. Zhao, ”Remote sensing image reconstruction using tensor ring completion and total-variation,” IEEE Trans. Geosci. Remote Sens., (accepted for publication), 2019.
PDF    Quick Abstract

Abstract:Time-series remote sensing (RS) images are often corrupted by various types of missing information such as dead pixels, clouds, and cloud shadows that significantly influence the subsequent applications. In this paper, we introduce a new low-rank tensor decomposition model, termed tensor ring (TR) decomposition, to the analysis of RS datasets and propose a TR completion method for the missing information reconstruction. The proposed TR completion model has the ability to utilize the low-rank property of time-series RS images from different dimensions. To furtherly explore the smoothness of the RS image spatial information, total-variation regularization is also incorporated into the TR completion model. The proposed model is efficiently solved using two algorithms, the augmented Lagrange multiplier (ALM) and the alternating least square (ALS) methods. The simulated and real data experiments show superior performance compared to other state-of-the-art low-rank related algorithms.

Remote Sensing Image Analysis

Remote sensing enables us to observe places that are inaccessible to humans; however, it is difficult to collect enough training data due to the limitations of field surveys and visual interpretation. We work on mapping and 3D reconstruction by using synthetic data from simulations and inaccurate labels with low collection costs as training data. We also work on data fusion based on deep learning to handle multimodal data obtained from different spaceborne sensors in an integrated manner.

Related publications

C. Robinson, K. Malkin, N. Jojic, H. Chen, R. Qin, C. Xiao, M. Schmitt, P. Ghamisi, R. Haensch, and N. Yokoya, ”Global land cover mapping with weak supervision: Outcome of the 2020 IEEE GRSS Data Fusion Contest,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 3185-3199, 2021.
PDF    Quick Abstract

Abstract: This paper presents the scientific outcomes of the 2020 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. The 2020 Contest addressed the problem of automatic global land-cover mapping with weak supervision, i.e. estimating high-resolution semantic maps while only low-resolution reference data is available during training. Two separate competitions were organized to assess two different scenarios: 1) high-resolution labels are not available at all and 2) a small amount of high-resolution labels are available additionally to low-resolution reference data. In this paper we describe the DFC2020 dataset that remains available for further evaluation of corresponding approaches and report the results of the best-performing methods during the contest.

D. Hong, L. Gao, N. Yokoya, J. Yao, J. Chanussot, Q. Du, and B. Zhang, ”More diverse means better: Multimodal deep learning meets remote-sensing imagery classification,” IEEE Transactions on Geoscience and Remote Sensing, (early access), 2020.
PDF    Code    Quick Abstract

Abstract: Classification and identification of the materials lying over or beneath the Earth's surface have long been a fundamental but challenging research topic in geoscience and remote sensing (RS) and have garnered a growing concern owing to the recent advancements of deep learning techniques. Although deep networks have been successfully applied in single-modality-dominated classification tasks, yet their performance inevitably meets the bottleneck in complex scenes that need to be finely classified, due to the limitation of information diversity. In this work, we provide a baseline solution to the aforementioned difficulty by developing a general multimodal deep learning (MDL) framework. In particular, we also investigate a special case of multi-modality learning (MML) -- cross-modality learning (CML) that exists widely in RS image classification applications. By focusing on ``what'', ``where'', and ``how'' to fuse, we show different fusion strategies as well as how to train deep networks and build the network architecture. Specifically, five fusion architectures are introduced and developed, further being unified in our MDL framework. More significantly, our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs). To validate the effectiveness and superiority of the MDL framework, extensive experiments related to the settings of MML and CML are conducted on two different multimodal RS datasets. Furthermore, the codes and datasets will be available at: https://github.com/danfenghong/IEEE_TGRS_MDL-RS, contributing to the RS community.

S. Kunwar, H. Chen, M. Lin, H. Zhang, P. D'Angelo, D. Cerra, S. M. Azimi, M. Brown, G. Hager, N. Yokoya, R. Haensch, and B. Le Saux, ”Large-Scale Semantic 3D Reconstruction: Outcome of the 2019 IEEE GRSS Data Fusion Contest - Part A,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 14, pp. 922-935, 2020.
PDF    Quick Abstract

Abstract: In this paper, we present the scientific outcomes of the 2019 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. The 2019 Contest addressed the problem of 3D reconstruction and 3D semantic understanding on a large scale. Several competitions were organized to assess specific issues, such as elevation estimation and semantic mapping from a single view, two views, or multiple views. In this Part A, we report the results of the best-performing approaches for semantic 3D reconstruction according to these various set-ups, while 3D point cloud semantic mapping is discussed in Part B.

Towards a Sustainable Future

We promote projects to solve global issues, including environmental problems, climate change, large-scale natural disasters, and food problems. Our goal is to contribute globally to the realization of the SDGs by solving real-world problems, such as assessing building damage during disasters, estimating biomass and carbon stocks in forests, and mapping crop types, in collaboration with related institutions and researchers in Japan and overseas.

Related publications

B. Adriano, N. Yokoya, J. Xia, H. Miura, W. Liu, M. Matsuoka, and S. Koshimura, ”Learning from multimodal and multitemporal earth observation data for building damage mapping,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 175, pp. 132-143, 2021.
PDF    Quick Abstract

Abstract: Earth observation technologies, such as optical imaging and synthetic aperture radar (SAR), provide excellent means to monitor ever-growing urban environments continuously. Notably, in the case of large-scale disasters (e.g., tsunamis and earthquakes), in which a response is highly time-critical, images from both data modalities can complement each other to accurately convey the full damage condition in the disaster's aftermath. However, due to several factors, such as weather and satellite coverage, it is often uncertain which data modality will be the first available for rapid disaster response efforts. Hence, novel methodologies that can utilize all accessible EO datasets are essential for disaster management. In this study, we have developed a global multisensor and multitemporal dataset for building damage mapping. We included building damage characteristics from three disaster types, namely, earthquakes, tsunamis, and typhoons, and considered three building damage categories. The global dataset contains high-resolution optical imagery and high-to-moderate-resolution multiband SAR data acquired before and after each disaster. Using this comprehensive dataset, we analyzed five data modality scenarios for damage mapping: single-mode (optical and SAR datasets), cross-modal (pre-disaster optical and post-disaster SAR datasets), and mode fusion scenarios. We defined a damage mapping framework for the semantic segmentation of damaged buildings based on a deep convolutional neural network algorithm. We compare our approach to another state-of-the-art baseline model for damage mapping. The results indicated that our dataset, together with a deep learning network, enabled acceptable predictions for all the data modality scenarios.

N. Yokoya, K. Yamanoi, W. He, G. Baier, B. Adriano, H. Miura, and S. Oishi, ”Breaking limits of remote sensing by deep learning from simulated data for flood and debris flow mapping,” IEEE Transactions on Geoscience and Remote Sensing, (early access), 2020.
PDF    Code    Quick Abstract

Abstract: We propose a framework that estimates inundation depth (maximum water level) and debris-flow-induced topographic deformation from remote sensing imagery by integrating deep learning and numerical simulation. A water and debris flow simulator generates training data for various artificial disaster scenarios. We show that regression models based on Attention U-Net and LinkNet architectures trained on such synthetic data can predict the maximum water level and topographic deformation from a remote sensing-derived change detection map and a digital elevation model. The proposed framework has an inpainting capability, thus mitigating the false negatives that are inevitable in remote sensing image analysis. Our framework breaks limits of remote sensing and enables rapid estimation of inundation depth and topographic deformation, essential information for emergency response, including rescue and relief activities. We conduct experiments with both synthetic and real data for two disaster events that caused simultaneous flooding and debris flows and demonstrate the effectiveness of our approach quantitatively and qualitatively. Our code and datasets are available at https://github.com/nyokoya/dlsim.

T. D. Pham, N. Yokoya, T. T. T. Nguyen, N. N. Le, N. T. Ha, J. Xia, W. Takeuchi, T. D. Pham, ”Improvement of mangrove soil carbon stocks estimation in North Vietnam using Sentinel-2 data and machine learning approach,” GIScience & Remote Sensing, vol. 58, no. 1, pp. 68-87, 2021.
PDF    Quick Abstract

Abstract: Quantifying total carbon (TC) stocks in soil across various mangrove ecosystems is key to understanding the global carbon cycle to reduce greenhouse gas emissions. Estimating mangrove TC at a large scale remains challenging due to the difficulty and high cost of soil carbon measurements when the number of samples is high. In the present study, we investigated the capability of Sentinel-2 multispectral data together with a state-of-the-art machine learning (ML) technique, which is a combination of CatBoost regression (CBR) and a genetic algorithm (GA) for feature selection and optimization (the CBR-GA model) to estimate the mangrove soil C stocks across the mangrove ecosystems in North Vietnam. We used the field survey data collected from 177 soil cores. We compared the performance of the proposed model with those of the four ML algorithms, i.e., the extreme gradient boosting regression (XGBR), the light gradient boosting machine regression (LGBMR), the support vector regression (SVR), and the random forest regression (RFR) models. Our proposed model estimated the TC level in the soil as 35.06–166.83 Mg ha−1 (average = 92.27 Mg ha−1) with satisfactory accuracy (R 2 = 0.665, RMSE = 18.41 Mg ha−1) and yielded the best prediction performance among all the ML techniques. We conclude that the Sentinel-2 data combined with the CBR-GA model can improve estimates of the mangrove TC at 10 m spatial resolution in tropical areas. The effectiveness of the proposed approach should be further evaluated for different mangrove soils of the other mangrove ecosystems in tropical and semi-tropical regions.