Data: 04/11/2024 à 07/11/2024
Local: Florianópolis-SC
Mais informações: https://www.abrhidro.org.br/iebhe
Application of Random Forest Technique for Simulating Daily Runoff in the Santa Lucia Watershed, Uruguay
Código
I-EBHE0086
Autores
Claudio Federico Vilaseca Martínez, Alberto Castro, CHRISTIAN CHRETIES, Angela Gorgoglione
Tema
WG 1.12: Development & application of river basin simulators
Resumo
The Santa Lucia watershed in Uruguay is a strategic region for the country since it provides raw water for purification and supply to more than half of the population. This watershed faces challenges related to water availability and water quality. This study explores the application of the random forest technique to simulate daily runoff in the Santa Lucia watershed. The latter was divided into seven subwatersheds, considering the streamflow monitoring stations as outlets, and independent models were applied to each of them. Two datasets were generated for modeling: Dataset A, which included both original variables (streamflow (Q), average air temperature) and artificial variables (mean areal precipitation (MAP), accumulated precipitation series over the previous 7 days (MAPaccum), flow series with a lag of one (Qt?1) and two (Qt?2) days), and Dataset B, which included all variables except for Qt?1 and Qt?2. In total, 14 models were implemented, one for each dataset at each subcatchment. The models were trained and validated using historical runoff data with a k-fold cross validation approach to ensure the robustness of the models. Optuna conducted a hyperparameter search using the Nash-Sutcliffe Efficiency (NSE) as the objective function, calculated through cross-validation on the training dataset. Additionally, a feature importance analysis was carried out using SHAP (SHapley Additive exPlanations). Results demonstrated that the random forest models performed satisfactorily, capturing the daily runoff dynamics with different precision. Specifically, with Dataset A, NSE values ranged from 0.51 to 0.89, while with Dataset B, NSE values varied between 0.3 and 0.62. The feature importance analysis revealed that for models trained with Dataset A, the most important variable was Qt?1, showing a direct correlation with the output variable (Q). The second most important variable was either MAPaccum or Qt?2, depending on the subcatchment. In models generated from Dataset B, the most important variable was MAPaccum, followed by average air temperature (T) or MAP, depending on the case. This suggests that random forest, combined with advanced validation and optimization techniques, is a viable tool for hydrological modeling in the Santa Lucia watershed, providing valuable insights for water resource management and planning in the region.