Module 4

DSM Refinement with Machine Learning

Objectives: Increase DSM accuracy using a Random Forest model to correct systematic biases.

1. Concept

The random forest (RF) model learns the residual errors from the linear model and uses them to correct predictions, improving the DSM.

RANDOM FOREST THEORY

The Random Forest (RF) algorithm is an ensemble learning method that builds many independent decision trees and combines their predictions by averaging (regression) or voting (classification).

It is particularly effective for DSM because it handles nonlinear relationships and variable interactions without requiring explicit model specification.

Core Principles

Bootstrap aggregation (bagging): each tree is trained on a random subset of the data (with replacement).
Feature randomness: at each split, a random subset of predictors is tested.
Ensemble averaging: the final prediction is the mean of all trees’ outputs.

This process reduces variance and prevents overfitting, while maintaining low bias.

Mathematical formulation

For \(n\) trees \(T_1, T_2, \ldots, T_n\):

\[ \hat{y}\left( x \right) = \frac{1}{n}\sum\limits_{i = 1}^n {T_i\left( x \right)}\]

Each tree \(T_i\) learns from a bootstrapped sample, producing an independent model.

Advantages of RF to DSM

Handles nonlinear and interaction effects between spectral, topographic, and model-derived covariates.
Provides variable importance, allowing interpretation of which features (e.g., NDWI, transpiration) explain most of the residual variance.
Naturally supports ensemble correction, making it ideal to improve the residuals of the coarse linear DSM.

2. Implementation Steps

Compute residuals (errors) between linear model predictions and ground truth.
Use residuals as the target variable for RF training.
Use as predictors:
- NDWI 95th percentile (p95)
- Transpiration sum (from MONICA simulations)

3. Output

Corrected DSMs for sand, clay, and SOC at all depths.
Accuracy metrics (R², RMSE, residual maps).
Demonstration of improvement relative to coarse linear DSM.