Module 4
DSM Refinement with Machine Learning
Objectives: Increase DSM accuracy using a Random Forest model to correct systematic biases.
1. Concept
The random forest (RF) model learns the residual errors from the linear model and uses them to correct predictions, improving the DSM.
RANDOM FOREST THEORY
The Random Forest (RF) algorithm is an ensemble learning method that builds many independent decision trees and combines their predictions by averaging (regression) or voting (classification).
It is particularly effective for DSM because it handles nonlinear relationships and variable interactions without requiring explicit model specification.
Core Principles
- Bootstrap aggregation (bagging): each tree is trained on a random subset of the data (with replacement).
- Feature randomness: at each split, a random subset of predictors is tested.
- Ensemble averaging: the final prediction is the mean of all trees’ outputs.
This process reduces variance and prevents overfitting, while maintaining low bias.
Mathematical formulation
For \(n\) trees \(T_1, T_2, \ldots, T_n\):
\[
\hat{y}\left( x \right) = \frac{1}{n}\sum\limits_{i = 1}^n {T_i\left( x \right)}\]
Each tree \(T_i\) learns from a bootstrapped sample, producing an independent model.
Advantages of RF to DSM
- Handles nonlinear and interaction effects between spectral, topographic, and model-derived covariates.
- Provides variable importance, allowing interpretation of which features (e.g., NDWI, transpiration) explain most of the residual variance.
- Naturally supports ensemble correction, making it ideal to improve the residuals of the coarse linear DSM.
2. Implementation Steps
- Compute residuals (errors) between linear model predictions and ground truth.
- Use residuals as the target variable for RF training.
- Use as predictors:
- NDWI 95th percentile (p95)
- Transpiration sum (from MONICA simulations)
3. Output
- Corrected DSMs for sand, clay, and SOC at all depths.
- Accuracy metrics (R², RMSE, residual maps).
- Demonstration of improvement relative to coarse linear DSM.