The initial retrieval and pre-processing of satellite data will be executed on GEE. GEE is a cloud computing platform designed for large-scale geospatial data analysis, enabling efficient processing of petabytes of remote sensing data.
C Band SAR data from Senitnel 1 satellites will be use as the primary input features for training the deep learning model. The basic know-hows of the Sentinel 1 SAR data have been already explained in the previous lecture sections. The extraction of Sentinel 1 data was done through GEE image collection ‘COPERNICUS/S1_GRD_FLOAT’. The collection already provides two single co-polarization VV, HH two dual band cross polarization HV, VH, and angle band. For the practical usage, especially for training machine learning model the bands VV and VH has been widely used in the literatures (Nikaein et al., 2023). The Sentinel 1 data on GEE archive already comes in a pre-processed through thermal noise removal, radiometric calibration, and terrain correction. However, as SAR backscatters are highly noisy, speckle filtering is an essential correction.
Speckle Filtering
Speckle is a salt-and-pepper-like granular noise inherent to SAR images, resulting from the coherent nature of radar waves and the constructive and destructive interference among backscattered signals from many elementary scatterers within a single resolution cell. This noise reduces the visual quality and makes image interpretation and automated classification difficult. Speckle filters (e.g., Lee, Gamma Map, or refined Lee filters) are applied to reduce this noise while attempting to preserve the integrity of edges and subtle features, thereby enhancing the image quality and improving the accuracy of subsequent analysis.
Terrain Correction
Terrain correction, specifically Radiometric Terrain Correction (RTC), is applied to Sentinel-1 SAR data to account for distortions caused by topography. The intensity of the backscattered radar signal is significantly affected by the slope and aspect of the terrain relative to the radar beam. For instance, slopes facing the sensor appear brighter (foreshortening), while those facing away appear darker (shadowing), regardless of the actual land cover. RTC uses a Digital Elevation Model (DEM) to normalize the backscatter intensity, effectively removing these geometric and radiometric distortions induced by topography. This correction is vital for ensuring that the processed SAR data accurately reflects the physical properties of the surface and is consistent across different scenes and times.
Addressing the necessity of the above preprocessing steps (Mullissa et al., 2021) proposed a comprehensive SAR pre-processing framework and a library to automate the preprocessing directly through GEE python API. For the preprocessing of Sentinel 1 data this pipeline is adapted.
Cloud Masking (Sentinel-2)
Cloud masking is a crucial preprocessing step for optical satellite imagery, such as that from Sentinel-2. Clouds and their shadows obscure the ground, leading to incorrect spectral signatures and misleading analysis results. There are various cloud masking techniques in the literature which falls into 2 major categories, multi-temporal and mono-temporal approaches. Multi-temporal methods like MAJA and Tmask compare the current image with previous, cloud-free scenes of the same location. Although highly effective, their requirement for auxiliary historical data limits their applicability in real-time scenarios. On the other hand, Mono-temporal Approaches require only the single image, offering reduced complexity, and are subdivided into:
Physical Rule-Based Methods: Tools such as Sen2Cor, Fmask, and QA60 use fixed thresholds on spectral bands and indices for classification. While computationally efficient, they are typically less accurate and struggle to differentiate between thick and thin clouds.
Machine Learning (ML) Approaches: These treat masking as a pixel classification problem, learning relationships from training data. An example is s2cloudless, a computationally efficient pixel-based ML method that, in its common form, does not identify cloud shadows or distinguish between thick and thin clouds (Skakun et al., 2022; Wright et al., 2024). GEE also provides quality assurance bands (like the SCL—Scene Classification Layer) provided with the imagery to identify the cloud shadows which can be integrated with the s2cloudless masking approach to efficiently remove cloud and cloud shadows((Sentinel-2 Cloud Masking with S2cloudless | Google Earth Engine, n.d.)).
Given the need for thousands of images to train the deep learning model, GEE's capability for handling massive datasets is essential. Once the Sentinel 1 and Sentinel bands data has been pre-processed, those will be downloaded to the local storage and then further processed on the local infrastructure. One can run the script 0_0_2Band_TrainData_Download-S1.py, and 0_1_2Band_TrainData_Download-S2.py respectively located in the folder ./1_Preparation/
Before running the python scripts, please follow the instructions below to change the variables.
For using google earth engine, every user needs to authenticate themselves using there GMAIL account. This authentication process is automated for simultaneous downloading. The automation process requires a google service account and and a private key for service for service account. To learn more about how to create a service account and generate a key for it, follow the steps mentioned in official GEE documentation on Service Accounts | Google Earth Engine | Google for Developers
After creating the service account one should have the following:
i. A service account as ‘ ’
ii. A private key in .json format
Change the variables: Change the following variables in 0_0_2Band_TrainData_Download-S1.py:
Run the script: Run the Script by running the following command-
python 0_0_2Band_TrainData_Download-S1.py
This will download the pre-processed sentinel 1 imageries
Open the file 0_1_2Band_TrainData_Download-S2.py and change the following paramters accordingly.
Run the script: Run the Script by running the following command-
python 0_1_2Band_TrainData_Download-S2.py
This will download the pre-processed sentinel 2 imageries