Inside the Data Pipeline: How We Built a Framework for Identifying Emerging Locations

By Jonas Elzinga

ChatGPT Image Apr 1, 2026, 03_01_40 PM

European Gentrification Series II Part 2

Overcoming the challenge of inconsistent urban data

The main challenge when trying to model gentrification lies not in the theory but in the data. Neighbourhood boundaries are changing, the definitions of indicators change, and hyperlocal dynamics remain largely invisible in traditional data sets. Investors who want to compare neighbourhoods over a decade find out rather quickly that local official statistics are not designed for long-term trend analysis. For this reason, the core contribution of our research is the construction of a unified data pipeline that reconstructs neighbourhood evolution from 2006 onward, consistently and comparably.

Harmonizing datasets across spatial boundaries

The pipeline starts with regional socio-economic datasets that capture family income, education, property values, household composition, dwelling characteristics, and a range of related variables. These datasets often vary significantly across multiple years, rendering any direct comparison across them invalid. In resolving this challenge, we standardised all features, systematised column structures, harmonised inconsistent data types, and dealt with missing values on a large scale. More importantly, we developed an elaborate mapping system that reconciles boundary changes across years. The incorporation of this step itself required manual reconstruction of spatial correspondences to ensure that the neighbourhoods of 2010 could be meaningfully compared to those of 2020.

Integrating hyperlocal 100-meter grid analysis

On top of this regional structure, we integrated hyperlocal 100-metre grid data. These micro-geographies disclose early-stage transformation patterns that are invisible at the neighbourhood level. Amenity aggregations within a 3-kilometre radius, like restaurants, cafés, and cultural venues, and proximity to transit nodes serve as behavioural sensors, indicating where residents and businesses are moving before conventional housing indicators respond. We believe this combination of scale levels creates a multiresolution perspective on urban change rarely achieved in gentrification studies.

Engineering features for trend detection

Once all data sources were aligned, we constructed a suite of features capturing both the level and trajectory of neighbourhood indicators. Trends matter: three-year slopes for income, education, property values, and amenity density reveal whether a neighbourhood is accelerating, stagnating, or declining. All features were normalised, and winsorization was applied to control outliers; we included structural flags so that affluent areas would not be mislabeled as gentrifying simply because of high baseline values.

Data cleaning pipeline

Jonas Elzinga, KR&A, Barcelona (2025)

Validating trajectories through machine learning

The result is a complete model that can assign each neighbourhood a continuous gentrification score at any moment in time. This score is then thresholded into three classes: stable, moderate, and high-certainty, representing distinct developmental trajectories. Finally, we constructed forecasting models, comparing Random Forest, logistic regression, and XGBoost architectures. Of these, XGBoost posted the best results, especially for the moderate and high-certainty gentrification predictions three years ahead.

Forecasting result table

 Jonas Elzinga, KR&A, Barcelona (2025)

Creating a platform for early emerging location detection

This pipeline solves a problem that has long plagued investors: the inability to analyse cities at a consistent temporal resolution. We fix neighbourhood histories in time and tie together their hyperlocal dynamics to create a platform for detecting emerging locations well in advance of them reaching mainstream visibility. In the third blog of this series, we extend this framework to tourism-driven gentrification, which is a fast-emerging driver of urban change across Europe.

Continue Reading

Subscribe to our newsletter
for the latest updates

You will be updated on the latest developments and informed about new blogs being published.