This document summarizes the recommended methods and results you should obtain when evaluating a points-of-interest (POI) data set (e.g. SafeGraph Places data). To see the quantitative results of this evaluation, please contact us and we are happy to provide results on these metrics (or give you access to data to confirm these metrics yourself).
This evaluation addresses the three major quality categories when evaluating point of interest (POI) data: precision, recall, and completeness. Methodology and detailed results for each metric can be found in their corresponding sections.
Good results should be 0 - 10 meters away from truth set (Google Maps).
Good results should show > 70% of tested polygons are true to building footprint as represented by truth set (Google Maps).
Good results should show > 99.9% of POI attributes for top brands are accurate as compared to truth set (online store locators).
Good results should show POI counts for brands/chains are within 0 - 2% of tested truth set (store locators).
Good results should show high fill rates for important attributes
Category: > 90%
Phone Number: > 70%
Open Hours: > 50%
These should be even higher for major brands (chains).
Are SafeGraph Places actually located where they purport to be?
Every POI in the SafeGraph Places dataset includes columns for the interpolated
longitude values for a POI. A coordinate accuracy measurement compares the SafeGraph coordinate values to an accepted coordinate truth set (Google Maps).
To measure the distance between SafeGraph and Google POI coordinates, we recommend the Google Places API to make Find Places requests for all POI in the SafeGraph dataset. More specifically, we provide the address for all SafeGraph POIs and compare the returned Google coordinates to the associated SafeGraph POI coordinates. The distance between coordinates is measured in meters.
In aggregate, we find that the median distance between SafeGraph and Google Maps coordinates for all SafeGraph POIs is very small (usually 0-5m). The distribution of POI distance from Google Maps is presented below:
In contrast, we've found that other POI data providers show centroid precision ranging from 18-65 meters in median distance from Google Maps with a mean median distance of 40 meters.
Do SafeGraph Places Polygons represent the exact shape of buildings?
The SafeGraph Places dataset includes two fields that describe POI geometry:
polygon_wkt: a polygon that represents the shape of the POI, formatted as Well-Known-Text (WKT).
polygon_class: a field that describes whether the polygon describes the POI itself (
owned_polygon) or if the polygon is shared by more than one POI (`shared_polygon).
To measure the accuracy of polygons, filter to polygons that represent a single POI by only including
owned_polygon values for the
polygon_class. Select a random subset of (e.g. 1,000) POIs in the dataset for human verification. For each selected polygon, a tester can overlay the polygon on top of Google Maps and score in a binary manner whether a polygon accurately represented the shape of a building. A polygon can be determined as accurate when:
- The polygon represents the associated POI in the dataset. Inversely, a polygon is inaccurate if it was the correct shape of a building but associated with the wrong POI.
- The polygon accurately covers the building footprint of interest in both shape and size.
- If a POI is part of a larger structure (such as a strip mall), the polygon should accurately represent the shape and size of the individual store.
- Polygons were only determined to be accurate if they were within 2 meters of the Google Maps imagery as this discrepancy can be accounted for in differing pitches of satellite imagery.
When inaccurate, the polygons can be classified into the following inaccurate categories:
- Centroid: the tested data was a not a building polygon but rather an approximated circular polygon derived from the POI centroid with a radius applied
- Shape: the polygon was the wrong shape compared to the POI.
- Size: the polygon was either smaller or larger than the POI.
- Wrong Place: the polygon did not represent correct POI even if it was the correct shape and size of a building.
Examples of correct and incorrect polygons are shown below:
Accurate location, shape and size.
Accurate location, shape and size within the context of a strip mall.
Wrong place as it represents a structure that doesn't exist.
Accurate shape and size for selected building (within 2m) but wrong POI (address is for the other building).
Correct location but inaccurate shape and size (includes more than one store in a strip mall).
Are POIs associated with accurate business information (address, phone number, open hours, etc.)?
Each SafeGraph place includes the following business information:
Most (see completeness results) SafeGraph Places also include:
To estimate the accuracy of this business metadata, you can create a randomized subset of POI that includes all attributes of interest:
- e.g., Select 50 random brands from the dataset where their store count is greater than 1,000 stores nationally. Select 10 random stores for each of those brands where all attributes were included.
This randomized subset of branded POI can be compared to the data provided by online corporate websites for each of these brands by human verifiers. For example, the
Lowe’s brand can be tested against the truth set provided at https://www.lowes.com/store/.
The NAICS code for the 50 random brands selected can be verified by human judgment.
Does SafeGraph Places include all POI for selected brands?
To assess the accuracy of branded POI counts, generate a randomized sample of 20
safegraph_brand_ids where store counts were greater than 1,000 stores nationally and measure the total count of POI for each brand. For each brand, the SafeGraph Places count can be compared to the count of stores listed on the brand’s store locator site. Note to determine he number of stores listed on the brand’s store locator website may require building a custom website scraping solution.
What coverage does SafeGraph places offer and what are the fill rates for POI attributes?
For example, you may want to examine the completness of data coverage for high-value attributes like:
Fill rate is defined as the percentage of non-null values for the attribute of interest in the dataset which can be computed with a simple query.
Please see Places Summary Statistics for a complete list of attribute counts and fill rates for the latest SafeGraph Places release. We recommend examining fill rates both overall and for high-value major retail chains (brands).
Updated about a month ago