Section 1

Overview of project setup and data sourcing

Estimated Time: ~20 minutes

This practical component provides hands-on experience in developing and validating a method for large-scale, long-term prediction of groundwater levels, with a particular focus on heterogeneous terrains. To this end, we will enhance decision tree-based machine-learning models, such as Random Forests (RF), by incorporating information on surrounding conditions (e.g., spatial context, distance and observations from the nearby monitoring points to the unmonitored points). The approach will use groundwater time series from the monitoring network, MODIS vegetation and land-surface products, and meteorological data from the German Weather Service (DWD).

We will walk through the R packages, libraries and required data set.

Software and computational setup
  • R-4.2.3
  • Data handling and spatial: sf, terra, stars, data.table.
  • Modeling: ranger (RF), e1071 (SVM), caret or tidymodels for tuning.
  • Geostatistics: gstat, automap.
  • Spatial CV: blockCV, sperrorest, CAST.
  • Spatially explicit RF: CAST::rfDist or spatialRF; implement RFSI pattern by adding neighbor values/distances; check dedicated packages if available.
  • Trend and time series: trend, Kendall, zoo, tsibble.

Input data

Groundwater:
  • Piezometer GWL observations (https://lfu.brandenburg.de/). Weekly series; metadata with location name, ID number, Time of export, Period start, Period end, Measuring point type, Height system, Terrain height, lowest water level cm below ground, mean water level cm below ground, highest water level cm below ground, lowest water level m above sea level, mean water level m above sea level, highest water level m above sea level, reference period main values
Covariates (1 km):
  • Topography: SLOPE, ASP, DEM, LNF, TPI, TWI, and TRI
  • Crop related paramters: NDVI, ET, and EVI
  • Climate and remote sensing: monthly precipitation, temperature, GRACE TWS anomalies (~ 1 km)

Table 1: Overview of the covariates used to predict groundwater head

Group Covariates Spatial/temporal resolution Source
Topographical attributes Slope 1 km × 1 km Bundesamt für Kartographie und Geodäsie
Aspect 1 km × 1 km
Digital elevation model 1 km × 1 km
Landform classification 1 km × 1 km
Topographic position index 1 km × 1 km
Topographic wetness index 1 km × 1 km
Terrain ruggedness index 1 km × 1 km
Coordinates Coordinates (x) NA
Coordinates (y) NA
Hydrological parameter Evapotranspiration 1 km × 1 km / monthly MOD16A2GF.061
Running et al., 2021
Meteorological parameter Temperature 1 km × 1 km / monthly Deutscher Wetterdienst (The mean values for all usable stations are projected onto a 1 x 1 km grid structure)
Precipitation 1 km × 1 km / monthly
Vegetation parameters Normalised difference Vegetation index 1 km × 1 km / monthly MOD13A3.061
Enhanced vegetation index 1 km × 1 km / monthly

Please make sure you have installed the following R packages before running the code:

Before running the prject, make sure your R studios has follows these libraries installed:


### Install required packages ###
install.packages(c(
  "raster", "rgdal", "sp", "sf", "dplyr", "tidyr", "stringr", 
  "ggplot2", "nabor", "ranger", "doParallel", "gstat", "plyr", 
  "caret", "CAST", "ggpubr", "automap", "grid", "hexbin", 
  "ggsn", "mlr", "tuneRanger"
))

### Install required libraries ###
libraries <- c(
  "raster", "rgdal", "sp", "sf", "dplyr", "tidyr", "stringr", 
  "ggplot2", "nabor", "ranger", "doParallel", "gstat", "plyr", 
  "caret", "CAST", "ggpubr", "automap", "grid", "hexbin", 
  "ggsn", "mlr", "tuneRanger", "meteo"
)