Data Quality

Jonas Elzinga and Arjan Knibbe

In the rapidly changing world of real estate, reliable data is crucial. As the industry evolves, accurate and high-quality data has become more essential than ever. In this digital age, where information is abundant but not always reliable, having clean data is vital for success. As AI is replacing a number of human tasks, and usual normal checks are gradually diminished, data quality has become an even more important theme. In this blog we will talk about why data quality is important, what features of data quality are the most interesting and which different ways there are to assess the quality of data. In the end we will also elaborate about our experience with our proprietary data, and show part of a matrix that we made to see how our data is doing on different aspects.

WHY IS DATA QUALITY IMPORTANT

Data Features

The first thing that is important for data quality, are the different aspects that we can look at when we are assessing data quality. These are the most important and interesting features for the real estate sector with a short explanation:

Accuracy: The accuracy aspect checks if the data is correct.
Completeness: Completeness looks at how complete the data is, we look if we have all the data we need.
Validity: This aspect checks how valid the data is that is used.
Timeliness: This is an important aspect, because it tells how up to date the data is that is used.
Uniqueness: Uniqueness means that every unique data instance is only used once, this is important, to prevent any duplicates in the data.
Relevance: This aspect speaks for itself; it explains if the data that is used is relevant for the question that is to be answered. This is where domain knowledge is important.
Consistency: Consistency is an aspect that is also important, mostly if we want to use data from different projects together in another project.

There are a lot more data aspects, but these are the most important ones for assuring good quality data. A lot of examples of how these aspects can be used are in the full data quality report.

Data assessment

Another important part, to assess the quality of data. To assess the quality there are a few different methods, the first one being to check every data point manually, or test a few samples. A better way to assess the quality of the data is to do automated checks. For example, if we want to check if there are duplicates in some dataset, we can run a script that compares all the data to each other (possibly with some filters) and then automatically removes the duplicates or shows them to a human for a final check.

The different aspects of the data quality above are all checked in different ways, but even for a single aspect there are different ways to assess if the quality is good depending on how the data is stored. For example, timeliness can be assessed by seeing when data was updated for the last time and seeing if the updating happens regularly (yearly or monthly for example). But that does not tell us everything because old data that is updated today is still old data, that is why it is better to check it in different ways. By adding the date of when the data was created in the real world for example.

The data quality assessment differs greatly for every database that exists, so to find a specific assessment for our data is only possible after looking at the data thoroughly and completely understanding what all the data is. After that, the different aspects that are important for that data can be found, and then specific assessments can be made and used to improve the data quality.

Our data quality

We also assessed our own data using the different aspects. Some data quality aspects are particularly important for hyperlocal data, and some for regional data.

Materiality

Savvy real asset investors want to have maximum signal from their data investments. The marginal signal of data quality improvements is sometimes limited but very costly and would be an unwise investment. Weekly updates of demographic scenario’s bring little additional signal to the fundamental investor. Spending 1% of a data budget to remove five duplicate datapoints in a set with one hundred million datapoints is a bad idea.

In the matrix below we assess the importance of the most relevant features of data quality per dataset that we share or create or normalise or cleanse or calculate to create intelligence. To check our materiality views on hyperlocal data please get in touch for the full report.

Data Quality materiality matrix: (Low – high: 0 – +++)

Receive our full report on data quality

Continue Reading

Subscribe to our newsletter
for the latest updates

You will be updated on the latest developments and informed about new blogs being published.

Data Quality

Continue Reading

Data Quality

Adding Listed Infrastructure to our Analyses

Processing Raw OSM Data for Maximum Insights

Subscribe to our newsletter
for the latest updates

Schedule your personal introduction

Data Quality

Continue Reading

Data Quality

Adding Listed Infrastructure to our Analyses

Processing Raw OSM Data for Maximum Insights

Subscribe to our newsletter for the latest updates

Schedule your personal introduction

Subscribe to our newsletter
for the latest updates