ABRHidro - ANAIS - Machine Learning and Groundwater forecast: an analysis on how models deal with anisotropy and how to explain a model prediction.

Machine Learning and Groundwater forecast: an analysis on how models deal with anisotropy and how to explain a model prediction.

Código

I-EBHE0117

Autores

Gabriel Pelizari

Tema

WG 2.3: Near-term (annual to decadal) forecasts of water availability

Resumo

In recent years, the use of machine learning (ML) algorithms has become increasingly prevalent in the study of hydrological and hydrogeological phenomena, due to the ability of various ML algorithms of satisfactorily capturing the complexity of nonlinear phenomena and generating highly accurate predictions. However, several issues still need to be explored, such as how ML models manage heterogeneity (anisotropy) in an aquifer and methods to better explain the results presented by the models. This study aimed to evaluate the implementation of an ANN for predicting the piezometric level considering three different dataset arrangements: I) one where data were completely separated at each monitoring station, II) one where spring flows were not considered, and III) one where data of the same nature were aggregated by the total sum. The data correspond to historical timeseries of precipitation (one monitoring station), well pumping rates (three wells), discharge at six nearby springs ? Features, and piezometric level at one well ? Target, in the Cauê/Gandarela aquifer systems near Itabirito (MG), Brazil, covering the period from September 2015 to February 2024. Missing values at the beginning and at the end of the series were filled using back and forward propagation, and gaps in the middle of the series were filled using the ARMA method to homogenize the series. The model corresponded to an NBeats Net containing six blocks with four layers of 128 neurons, theta size and batch size of sixteen, activation function as ReLU, and the Adam optimizer with early stop. Model I achieved an RMSE of 1.14m, MAE of 0.79m, and R² of 0.867. Model II achieved an RMSE of 2.17m, MAE of 1.45m, and R² of 0.520. Model III achieved an RMSE of 2.89m, MAE of 1.86m, and R² of 0.146. To better understand how model I comprehends the heterogeneity in the data, feature permutation importance analysis (FPIA) and SHapley Additive exPlanations (SHAP) values analyses were conducted. FPIA identified the pumping data at well P00, precipitation, and discharge at spring N2A as the most influential factors in the prediction. Meanwhile, SHAP values analysis revealed a strong correlation between the observed piezometric levels and the discharges at springs N2A, N2B, N5, N6A, N6B, and N6C. These analyses demonstrated the use of a more complete database containing individual flow data in wells and springs may considerably improve the forecast. This can be explained by the NBeats model's ability to adequately capture the aquifer anisotropy and the nonlinear relationships between the monitored data. Moreover, the kind of analysis performed in this research can not only generate precise forecasts, but also allow for understanding the dynamics that control an observed phenomenon and for exploring correlations and dependencies within a hydrogeological system. It is worth noting that using a machine learning model is not exempt from a rigorous conceptual analysis, and models that incorporate positional information can be highly valuable for better understanding local aspects of the observed phenomena.

9th International Symposium on Integrated Water Resources Management (IWRM) | 14th International Workshop on Statistical Hydrology (STAHY) | I EBHE - Encontro Brasileiro de Hidrologia Estatística