Predicting asset flooding using machine learning and random forest classification

By Ian Ronk

Due to climate change, fluvial flooding is becoming a more commonly occurring phenomenon with destructive consequences, both for human life and assets. Some estimate that annual damages by flooding will grow to 50 billion dollars worldwide by 2050. Because of the destructive nature of the consequences of fluvial flooding, it is essential to understand the risk for assets from this hazard. A project, conducted by Ian Ronk, supervised by Sander Van Splunter (UvA) and KR&A, aims to understand the local characteristics of assets that could be subject to flooding in a 20-year return period, and achieves an accuracy of 97% on the public assets included in KR&A’s European Data Portal and 92% on a more general dataset.

In previous research, the approach to assessing fluvial flood risk was by using hydrological simulations to generate flood maps. These maps, however, cannot adapt to changes in climate or landscape, such as the construction of new dikes. These models are also not explainable as the intrinsic factors of why a certain area floods are not explained. For investors, explainability is one of the most important facets that data can provide, to both gain better insight into their investments and to be able to substantiate future decisions.

The project was conducted in three steps: literary review, data gathering and model training and evaluation. These steps will be explained below.

Step 1. Literary Review

To be able to predict fluvial flooding, it is important to understand which local features contribute to flooding itself and what features would be relevant to the prediction of the flooding risk. The literary review concluded with the finding of factors surrounding flooding, which resulted in features. These features are divided into three categories: micro, meso and macro features, corresponding to their respective area size. An example of such a micro feature is the artificial imperviousness of the location, whereas the distance to the nearest river is a meso feature and the GDP of a region a macro feature. The features are substantiated by conducted research or by logical reasoning.

Step 2. Data gathering and model training

All the data is retrieved from open-source datasets and consists mostly of geographic data. This data is consequently extracted for all of the locations in the dataset, which consist of approximately 16.000 carefully selected points for a binary model and 32.000 points for a quaternary model. The binary model classifies whether a location will flood in a time period of 20 years. The quaternary model was trained to indicate whether a flooding will not happen or will happen in a flood return period of 10, 50 or 200 years. As the true data, to train the model, a flooding simulation was used. Although these flood return period maps are not 100% accurate, they are a good benchmark to work from as there is significant overlap with historical flooding. Three different machine learning algorithms were trained and compared: a logistic classifier, a random forest classifier and a neural network.

Ranking high flood risk of investment portfolios

This is not just a theoretical exercise, with portfolio aggregation methods we can now pinpoint risk within portfolios. Above is a visualisation of flooding risk concentrations in the sub-portfolios (by sector and country) of Commerz Hausinvest and Cofinimmo. Portfolios’ flooding risks are ranked by high flood risk percentage, assets are equal-weighted.

Step 3. Evaluation

After the models were trained, they were compared and it became evident that the Random Forest model had the best accuracy with an accuracy on the general test set of 97.51% for the Binary model and 73.81% for the Quaternary model. The contribution to the predictive power of this model was investigated per feature, which resulted in the artificial imperviousness being the biggest contributor to the predictive power of the model.

Although current prediction models in real estate are mostly linear, the project shows that non-linear models can be implemented to make predictions. As ESG-driven decisions have increased in significance, our project highlights the data used to identify vulnerable assets. This asset vulnerability will only increase over time with changes in climate and therefor adaptable data systems need to be created.

Next week, we will look at the five biggest factors that indicate whether an asset is prone to flooding. Flooding itself is hard to predict, but the proneness to flooding of a certain area can be mostly explained by these five factors. Do not miss next week’s blog to see how you can determine the risk of flooding of your assets.

Continue Reading

Subscribe to our newsletter
for the latest updates

You will be updated on the latest developments and informed about new blogs being published.

Predicting asset flooding using machine learning and random forest classification

Continue Reading

Gulliver Revisited: A Cautionary Tale of AVMs

Tourism-driven gentrification in Europe: An unseen accelerator of city change

Inside the Data Pipeline: How We Built a Framework for Identifying Emerging Locations

Subscribe to our newsletter
for the latest updates

Schedule your personal introduction

Predicting asset flooding using machine learning and random forest classification

Continue Reading

Gulliver Revisited: A Cautionary Tale of AVMs

Tourism-driven gentrification in Europe: An unseen accelerator of city change

Inside the Data Pipeline: How We Built a Framework for Identifying Emerging Locations

Subscribe to our newsletter for the latest updates

Schedule your personal introduction

Subscribe to our newsletter
for the latest updates