In this section, we calculate the difference between the predicted and observed data, and improve the prediction using RF.
Although this step is relatively shorter than the others, the script for this section is the most complex. The initial part of the script 4.1_fit_error_field.py, calculates the error of the predicted soil values and the true observed soil data, ε=x-x̂.
The output from MONICA values, the transpiration, is summed (cumulative sum) and the top 95% values of the NDWI is averaged. Both values act as independent variables, and the predictor becomes the error. Since the error for the sand and clay are small with root mean squared error (RMSE) of 6.8 and 2.4; we focus on the SOC.
After the RF fitting, the model is implemented in the whole field, and the error is estimated. The model explicitly estimates the spatial distribution of residual errors. Then, after predicting the error using these two independent variables, we add the error to the coarse prediction, resulting into a finer prediction. The more locations selected initially, the higher the expected accuracy of this refinement step. For few data points selected (circa 40 or less locations), this step can be avoided.
In the second script, the 4.2_fit_error_field_Layer_1.py, the corrected map is implemented to the other layers.
An analysis to the output of the script, shows an RMSE of 0.24 to the coarse prediction (linear) and 0.19 to the fine SOC on the surface. The increment of data and machine learning use helped, but a small increase in the accuracy was observed.
This final step concludes the DSM workflow, producing high-resolution soil property maps that are both interpretable and scalable for precision agriculture applications.