Known Issues or Data Artifacts

Known Data Issues or Artifacts

We strive for full transparency for any known data issues that may affect your analysis, not otherwise accounted for in monthly release notes. If you notice a problem that is not listed here, please send us your observations so we can investigate.

Year to Date Known Issues

Date ReportedDescriptionDiscussion/LinksResolved?
5/13/2022In the May 11 delivery of Weekly Patterns (for the week starting 5/2), visits are declining about ~15-20% due to shifts in the panel and processing changes introduced at the beginning of May.

The drop in visits is larger than the drop in panel size (i.e., total_devices_seen only drops ~10%), and this primarily affects Patterns data in the US only (Canada is far less affected).
This is a one-time change introduced due to a change in supply plus processing changes to improve privacy protection.

Daily visit trends are the same for the majority of POIs, but this will impact longitudinal studies dependent on precise computation of relative change (particularly changes <20%).
We recommend using one of the pre-normalized columns if your longitudinal use case is impacted.

In particular, the normalized_visits_by_total_visits column is most consistent through this transition.
5/4/2022In the May 4 delivery of Weekly Patterns (for the week starting 4/25), the home_panel_summary supplemental file contains extra rows for certain US CBGs which have mis-assigned iso_country_code="CA".This only affects CBGs which are very close to the border, and is expected to be a minor, mostly cosmetic, issue. If this affects your analysis, please let us know.This will be fixed in the following release.
5/3/2022We noticed a bug in the visitor_home_aggregation column in August 2021 Monthly Patterns file. Census Block Groups were used instead of census tracts only for that month.This was the result of a bug fix which was not fully merged in until after the release and which, unfortunately, did not go noticed until now. We recommend summing up visitors to each tract using the CBG data (noting that the tract is the first 11 digits of the CBG) as a stopgap measure until resolution.This will be resolved in the next backfill of Patterns data.

Issues Reported in 2021

Date ReportedDescriptionDiscussion/LinksResolved?
12/15/2021There was a single-day drop in overall POI visits in SafeGraph Patterns on 12/7/21 resulting from lost device data due to the AWS outage that day. The drop is on the order of 15% for all of US.Customers using weekly or monthly aggregated columns should see minimal impact, but users looking at daily un-normalized data should use caution when interpreting visits for that day.This incident will not be resolved because the data are unrecoverable. Please reach out to SafeGraph if you have any questions about how to adjust for this behavior in the data.
11/29/2021There was an increase in the number devices in SafeGraph's Patterns panel starting 11/15/2021, with the largest increases in WA state (+20%). In the raw, non-normalized data, this appears as an increase in visitors.We recommend applying panel normalization to account for the changes in panel size if your application is sensitive to visits during this time.Not necessary since this is normal behavior.
11/18/2021A few parks in Seattle have highly anomalous visits in the July 2021 Backfill of Monthly and Weekly Patterns during a few historical weeks.This was confirmed to be isolated to just a few POIs and not to be due to Geometry errors. See discussion in Community here.No, and unlikely to be. Most likely the cause is a temporary sink of anonymized lat/long pings in these locations that slipped through QA. Our recommendation is to remove these outliers and impute the missing weeks.
9/14/2021A bug caused visits to large Golf Courses and Country Clubs (naics_code=713910) and Amusement and Theme Parks (naics_code=713110) to be inflated between April and August 2021, inclusive, including data in the July 2021 backfill. Median dwell time was similarly lower than it should have been for the same months due to this bug.

Note that visitors were not affected, just visits.

This affected ~17k POIs in Patterns (0.3%).
The bug primarily affected "large" POIs, in this case POIs in the two categories over a certain square footage, although not all such POIs were impacted.Visits and median dwell for these POIs were corrected for September 2021 data and onward.
9/7/2021In Neighborhood Patterns from Jan 2018 to June 2021, there are 1-2 days per month where the stops_by_day column does not match the sum of the relevant elements in the stops_by_each_hour array.A list of affected dates can be found here.Yes, as of the July 2021 Neighborhood Patterns release. Historical data will be corrected in the next backfill for Neighborhood Patterns.
8/12/2021There is a CBG in Manhattan around City Hall that indicates 10x as many devices in Neighborhood Patterns as neighboring CBGsSee this Community slack threadNo, and resolution will be unlikely owing to the fact that sources/sinks are sometimes inherent in GPS data. If this is affecting normalization, we recommend using normalizing using state values as opposed to CBG values.
8/6/2021Close to 6000 POIs (~0.1%) have visits assigned in Patterns after closed_on dates .See Relationship with opening and closing datesThis issue gets resolved with each backfill.
7/29/2021Neighborhood Patterns Home Panel Summary files have a small number of rows corresponding to Canadian neighborhoods.No, but when Canada Neighborhood Patterns gets released, the Home Panel Summary files will have many more rows for Canadian neighborhoods, so this behavior will become standard in the future.
7/7/2021Quotation marks in iso_country_codes_openand iso_country_codes_closed columns in Brand Info file are not encoded properly.Yes. This was resolved in the August 2021 release of Places.
7/6/2021safegraph_place_id and parent_safegraph_place_id columns dropped as of July-2021 Release Notes. safegraph_place_id and parent_safegraph_place_id were dropped and placekey and parent_placekey are referenced moving forward.N/A
7/6/2021tracking_opened_since column dropped as of July-2021 Release Notes. It was providing redundant information. If a POI has an opened_on value, it implies we've been tracking it since that date. If a POI does not have an opened_on value, it implies we were not able to track the exact date it opened.N/A
7/6/2021June 2021 version of Monthly Patterns appears to have large sinks of devices in a few CBGs, far greater than the population of those CBGs.No, but resolution will be unlikely owing to the fact that sources/sinks are sometimes inherent in GPS data. If this is affecting normalization, we recommend using normalizing using state values as opposed to CBG values.
6/4/2021Due to incorrect geometries, 6 U.S. POIs have a Canadian geocodes, leading to some odd behavior in Supplementary Files.Will be as soon as fixes to these geometries get ingested.
3/17/2021Processing error in Social Distancing Metrics on 3/8/2021 which resulted in an influx in devices on this day. This explains the sharp increase in devices seen and completely home devices on this date.No
3/17/2021Processing error in Weekly Patterns on 3/3 caused a decrease in visits. We backfilled the week of starting 3/1 to fix this.Community ThreadYes
1/12/2021Certain columns in Neighborhood Patterns columns were lower than expected.Community ThreadYes, in the July 2021 Backfill.

Prior to 2021 Known Issues

Date ReportedDescriptionDiscussion/LinksResolved?
11/18/2020In Social Distancing Metrics (and possibly other datasets) there are an abnormal number of records showing travel to/from parts of Kansas. This is likely due to a GPS data problem related to the the center of the country issue known to influence a very small minority of location data when non-GPS data is inadvertently mixed with GPS data.See this summary of known unexpected data trends for 2/25SafeGraph is always working to ensure the highest quality location data is used to build its products and we are always working to improve artifacts like this one.
8/30/20204/21/2019 (Easter) may be an anomalous day in Patterns data.We had a supply issue at that time that seemed to have decreased the number of visits collected artificially.Actively investigating. Workaround is to ignore data from this day.
7/7/2020Several inexplicably abnormal days of data in 2018. Dates affected: 3/15/2018, 9/15/2018, 9/16/2018Community DiscussionNo fix in medium term. Short term workaround is to omit completely if possible. Otherwise, replace with median imputation or some other method so the days have no impact on analysis.
6/30/2020opened_on column over-indexed on 2020-01See opened_on documentation
4/13/2020CBG FIPS are corrupted for some rows in Open Census Data file cbg_b22.csvUnfortunately, there is no timeline for fixing this. Apologies for the inconvenience. However, our Slack Community members can see Jonas Peeters solution.
4/6/2020Duplicate CBGs with Different StatesYes. Ignore State in home-panel-summary and aggregate within CBG. Product fix coming soon.
4/2/2020Problem with IOWA CBG 190570010001Yes. This CBG has been removed from SDM.
3/1/20202/25/2020 Artifact (affecting SDM and Patterns)See this summary of known unexpected data trends for 2/25

What’s Next