Welcome πŸ‘‹

Whether you’re a global enterprise, startup, or academic, learn how SafeGraph can improve your data science models.

Docs    Places API

July-2021 Backfill Release Notes

3 months ago by [email protected]

The backfill is here! πŸŽ‰ Here's your guide on how to use the July 2021 Backfill Data and what to expect as you dive in πŸ„β€β™€οΈ!

As a reminder, the backfill is when we take our most recent version of Places (i.e., Core + Geometry) and run our visit attribution algorithm backward in time to generate a new history of β€œbackfilled” Patterns. It happens no more than twice a year, most commonly in July β˜€οΈ and in December ❄️ as updates to Places warrant.

This backfill covers Jan 2019 to present. A second delivery of Jan 2018 to Dec 2018 US Backfilled Patterns will be completed by the end of July or early August was also completed at the end of July.

EDIT from Aug 3, 2021: The below release notes focus on data from 2019 onward, but the same guidance applies to newly-backfilled 2018 data for U.S. as well.

Table of Contents:

Specific improvements to Places you will see in the July 2021 Backfill

  • New POIs added. We added new industrial POIs in 2021, so visits to these POIs would previously only start appearing in Patterns data from the release they were added. Of course, visits to these POIs occurred historically as well; hence we "backfill" visits to these new POIs prior to 2021 in order to get proper full histories.
  • Improved Geometry info. As of June 2021, we fixed a number of geometry issues, specifically to ultra prominent POIs like Disney World and also generally across SafeGraph places. These will reflect in more accurate foot traffic data in Patterns.

What to expect in the US Backfill πŸ‡ΊπŸ‡Έ

In general, you should expect to see the same trends as in the previous backfill, but with slightly increased visit numbers overall based on the types of changes above. See below, for example, a time series comparison of the old (Dec 2020) and the new (July 2021) backfill, showing total visits to all POIs:

Daily total POI visits from the July 2021 and Dec 2020 backfills show similar trends overall.

Daily total POI visits from the July 2021 and Dec 2020 backfills show similar trends overall.

Visits to the vast majority of POI have remained stable πŸ§˜β€β™‚οΈ. However, there may be more subtle changes (increasing or decreasing visits) when aggregating visits to POIs of specific brands or specific NAICS codes. For instance, we observed a large increase to Nature Parks and Other Similar Institutions (naics_code = 712190) in this backfill due to improved Core and Geometry information.

What to expect in the Canada Backfill πŸ‡¨πŸ‡¦

This is the first year we’ve added Canada to the backfill! We added Canada Weekly Patterns data starting in the May 2021 release, and have now backfilled data for Canadian POIs back to January 2019. If you are interested in Canada Weekly Patterns data, reach out to your customer success manager for a sample.

It is important to normalize historical Canada patterns data when using 2020 data 🚨. This is because the size of our historical Canadian panel has changed in important ways compared to 2020 in our historical US panel.

Relative to Jan 2019, our panel underwent large increases around Jan 2020, and for most of of 2020, before returning to a more consistent level in 2021:

Daily Total Devices Seen for Canada, indexed on Jan 2019, showing variation in panel size historically.

Daily Total Devices Seen for Canada, indexed on Jan 2019, showing variation in panel size historically.

Per year on average, we saw the following number of devices daily:

Year
Average Daily Devices Seen

2019

328k

2020

542k

2021

362k

Because of these changes to the Canadian panel, we encourage users to experiment with various normalization techniques like we do in the US. For inspiration, see our Data Science Resources, particularly our most recent Google Colab notebook on normalization.

Canada normalization example

To drive home this point, when summing all visits to Starbucks POIs in Canada β˜•οΈ, one can produce very different time series when dividing by total devices seen (from the normalization_stats.csv Supplemental Files) or by total POI visits (computed as total visits minus home visits, also from normalization_stats.csv):

Starbucks visits normalized by total devices seen from normalization_stats.csv. Values show relative change from Jan 2019.

Starbucks visits normalized by total devices seen from normalization_stats.csv. Values show relative change from Jan 2019.

Starbucks visits normalized by total POI visits computed from normalization_stats.csv. Values show relative change from Jan 2019.

Starbucks visits normalized by total POI visits computed from normalization_stats.csv. Values show relative change from Jan 2019.

Overall, this means that you should be incorporating the domain knowledge of your own specific application when using Canada Patterns data to compare 2020 data to 2019 or 2021, particularly when using raw_visit counts.

Other changes you may notice

  • This backfill incorporates all of the schema changes from the July 2021 release, meaning safegraph_place_id and parent_safegraph_place_id are retired.
  • It also means that changes in the way columns are computed, such as to related_same_day_brand, are also reflected.
  • We have also released a backfill of Neighborhood Patterns, primarily fixing a bug with certain columns such as weekday_device_home_areas and weekend_device_home_areas which caused those columns to be lower than expected.
  • See the July 2021 Release Notes for all Patterns schema changes.

Issues and Artifacts

If you notice any issues with backfilled data, please reach out! While we have done our best to QA the data thoroughly and squash as many bugs as possible, inconsistencies can always creep in. See also Known Issues and Artifacts on our Docs site.