Recall

Recall, in the context of Places or POI data, refers to the ability of a dataset to accurately retrieve and represent all relevant entries that exist. It is a measure of the data's effectiveness in representing the richness and completeness of real world places. Higher recall indicates a more comprehensive representation of the world's places within the dataset.

Recall Metric: Coverage

Coverage in this context is a metric that we use to measure the recall of our data. It is a measure of quantity to determine how our dataset overlaps with some agreed upon truth set or benchmark. The entries included in the dataset are the numerator and the count of the actual places in existence is the denominator. You can read more on the way we calculate and monitor Coverage here: Accuracy Metric Methodologies

Rationale and Approach to Recall

At SafeGraph, we aim for total recall to provide our users with a truly comprehensive dataset. We want as high a coverage rate as possible. Our focus isn't just limited to popular or well-known places. We strive to include all types of places, such as:

Branded establishments
Non-branded establishments
Businesses of all sizes and scopes
Destinations (both popular and obscure)

See a broader list of our in-scope places.

To accomplish this, we ingest data from thousands of sources, each with varying degrees of sophistication:

1st Party Store locators
Specialty aggregators
Government sources at all levels
Purchased datasets

This strategy allows us to cast a wide net, increasing the chances that we capture more places, ultimately enriching our dataset. By diversifying the motivations behind the curated data provided by these sources, we are more likely to represent a wider array of places.

Mechanics of Improving Recall

Improving recall is an ongoing process at SafeGraph. Our techniques to improve recall include:

Constantly adding brands: We continually enhance our dataset by incorporating more brands every month. Often hundreds of new brands each month from all over the world. You can see the counts of brands added in our Release Notes and Places Summary Statistics.
Constantly adding sources: We constantly seek new data sources that can contribute to our dataset. The more sources we have, the more comprehensive our data becomes.
Investing heavily in matching and merging: As we continue to add more sources, we inevitably encounter duplicates. We've invested in a robust system that matches and merges these duplicates. This process is crucial to adding new places, as it allows us to move quickly when ingesting new data while ensuring our dataset remains reliable.

At SafeGraph, our commitment is to provide a dataset that best mirrors the diversity, complexity and ever-changing nature of the real world. And we achieve this through persistent efforts aimed at enhancing the recall of our Places dataset.

Recall Metric: Coverage

Rationale and Approach to Recall

Mechanics of Improving Recall

What’s Next