Known Issues or Data Artifacts
Known Data Issues or Artifacts
We strive for full transparency for any known data issues that may affect your analysis, not otherwise accounted for in monthly release notes. If you notice a problem that is not listed here, please send us your observations so we can investigate.
Year to Date Known Issues
Date Reported | Description | Discussion/Links | Resolved? |
---|---|---|---|
5/13/2022 | In the May 11 delivery of Weekly Patterns (for the week starting 5/2), visits are declining about ~15-20% due to shifts in the panel and processing changes introduced at the beginning of May. The drop in visits is larger than the drop in panel size (i.e., total_devices_seen only drops ~10%), and this primarily affects Patterns data in the US only (Canada is far less affected). | This is a one-time change introduced due to a change in supply plus processing changes to improve privacy protection. Daily visit trends are the same for the majority of POIs, but this will impact longitudinal studies dependent on precise computation of relative change (particularly changes <20%). | We recommend using one of the pre-normalized columns if your longitudinal use case is impacted. In particular, the normalized_visits_by_total_visits column is most consistent through this transition. |
5/4/2022 | In the May 4 delivery of Weekly Patterns (for the week starting 4/25), the home_panel_summary supplemental file contains extra rows for certain US CBGs which have mis-assigned iso_country_code="CA". | This only affects CBGs which are very close to the border, and is expected to be a minor, mostly cosmetic, issue. If this affects your analysis, please let us know. | This will be fixed in the following release. |
5/3/2022 | We noticed a bug in the visitor_home_aggregation column in August 2021 Monthly Patterns file. Census Block Groups were used instead of census tracts only for that month. | This was the result of a bug fix which was not fully merged in until after the release and which, unfortunately, did not go noticed until now. We recommend summing up visitors to each tract using the CBG data (noting that the tract is the first 11 digits of the CBG) as a stopgap measure until resolution. | This will be resolved in the next backfill of Patterns data. |
Issues Reported in 2021
Date Reported | Description | Discussion/Links | Resolved? |
---|---|---|---|
12/15/2021 | There was a single-day drop in overall POI visits in SafeGraph Patterns on 12/7/21 resulting from lost device data due to the AWS outage that day. The drop is on the order of 15% for all of US. | Customers using weekly or monthly aggregated columns should see minimal impact, but users looking at daily un-normalized data should use caution when interpreting visits for that day. | This incident will not be resolved because the data are unrecoverable. Please reach out to SafeGraph if you have any questions about how to adjust for this behavior in the data. |
11/29/2021 | There was an increase in the number devices in SafeGraph's Patterns panel starting 11/15/2021, with the largest increases in WA state (+20%). In the raw, non-normalized data, this appears as an increase in visitors. | We recommend applying panel normalization to account for the changes in panel size if your application is sensitive to visits during this time. | Not necessary since this is normal behavior. |
11/18/2021 | A few parks in Seattle have highly anomalous visits in the July 2021 Backfill of Monthly and Weekly Patterns during a few historical weeks. | This was confirmed to be isolated to just a few POIs and not to be due to Geometry errors. See discussion in Community here. | No, and unlikely to be. Most likely the cause is a temporary sink of anonymized lat/long pings in these locations that slipped through QA. Our recommendation is to remove these outliers and impute the missing weeks. |
9/14/2021 | A bug caused visits to large Golf Courses and Country Clubs (naics_code=713910 ) and Amusement and Theme Parks (naics_code=713110 ) to be inflated between April and August 2021, inclusive, including data in the July 2021 backfill. Median dwell time was similarly lower than it should have been for the same months due to this bug.Note that visitors were not affected, just visits. This affected ~17k POIs in Patterns (0.3%). | The bug primarily affected "large" POIs, in this case POIs in the two categories over a certain square footage, although not all such POIs were impacted. | Visits and median dwell for these POIs were corrected for September 2021 data and onward. |
9/7/2021 | In Neighborhood Patterns from Jan 2018 to June 2021, there are 1-2 days per month where the stops_by_day column does not match the sum of the relevant elements in the stops_by_each_hour array. | A list of affected dates can be found here. | Yes, as of the July 2021 Neighborhood Patterns release. Historical data will be corrected in the next backfill for Neighborhood Patterns. |
8/12/2021 | There is a CBG in Manhattan around City Hall that indicates 10x as many devices in Neighborhood Patterns as neighboring CBGs | See this Community slack thread | No, and resolution will be unlikely owing to the fact that sources/sinks are sometimes inherent in GPS data. If this is affecting normalization, we recommend using normalizing using state values as opposed to CBG values. |
8/6/2021 | Close to 6000 POIs (~0.1%) have visits assigned in Patterns after closed_on dates . | See Relationship with opening and closing dates | This issue gets resolved with each backfill. |
7/29/2021 | Neighborhood Patterns Home Panel Summary files have a small number of rows corresponding to Canadian neighborhoods. | No, but when Canada Neighborhood Patterns gets released, the Home Panel Summary files will have many more rows for Canadian neighborhoods, so this behavior will become standard in the future. | |
7/7/2021 | Quotation marks in iso_country_codes_open and iso_country_codes_closed columns in Brand Info file are not encoded properly. | Yes. This was resolved in the August 2021 release of Places. | |
7/6/2021 | safegraph_place_id and parent_safegraph_place_id columns dropped as of July-2021 Release Notes. safegraph_place_id and parent_safegraph_place_id were dropped and placekey and parent_placekey are referenced moving forward. | N/A | |
7/6/2021 | tracking_opened_since column dropped as of July-2021 Release Notes. It was providing redundant information. If a POI has an opened_on value, it implies we've been tracking it since that date. If a POI does not have an opened_on value, it implies we were not able to track the exact date it opened. | N/A | |
7/6/2021 | June 2021 version of Monthly Patterns appears to have large sinks of devices in a few CBGs, far greater than the population of those CBGs. | No, but resolution will be unlikely owing to the fact that sources/sinks are sometimes inherent in GPS data. If this is affecting normalization, we recommend using normalizing using state values as opposed to CBG values. | |
6/4/2021 | Due to incorrect geometries, 6 U.S. POIs have a Canadian geocodes, leading to some odd behavior in Supplementary Files. | Will be as soon as fixes to these geometries get ingested. | |
3/17/2021 | Processing error in Social Distancing Metrics on 3/8/2021 which resulted in an influx in devices on this day. This explains the sharp increase in devices seen and completely home devices on this date. | No | |
3/17/2021 | Processing error in Weekly Patterns on 3/3 caused a decrease in visits. We backfilled the week of starting 3/1 to fix this. | Community Thread | Yes |
1/12/2021 | Certain columns in Neighborhood Patterns columns were lower than expected. | Community Thread | Yes, in the July 2021 Backfill. |
Prior to 2021 Known Issues
Date Reported | Description | Discussion/Links | Resolved? |
---|---|---|---|
11/18/2020 | In Social Distancing Metrics (and possibly other datasets) there are an abnormal number of records showing travel to/from parts of Kansas. This is likely due to a GPS data problem related to the the center of the country issue known to influence a very small minority of location data when non-GPS data is inadvertently mixed with GPS data. | See this summary of known unexpected data trends for 2/25 | SafeGraph is always working to ensure the highest quality location data is used to build its products and we are always working to improve artifacts like this one. |
8/30/2020 | 4/21/2019 (Easter) may be an anomalous day in Patterns data. | We had a supply issue at that time that seemed to have decreased the number of visits collected artificially. | Actively investigating. Workaround is to ignore data from this day. |
7/7/2020 | Several inexplicably abnormal days of data in 2018. Dates affected: 3/15/2018, 9/15/2018, 9/16/2018 | Community Discussion | No fix in medium term. Short term workaround is to omit completely if possible. Otherwise, replace with median imputation or some other method so the days have no impact on analysis. |
6/30/2020 | opened_on column over-indexed on 2020-01 | See opened_on documentation | |
4/13/2020 | CBG FIPS are corrupted for some rows in Open Census Data file cbg_b22.csv | Unfortunately, there is no timeline for fixing this. Apologies for the inconvenience. However, our Slack Community members can see Jonas Peeters solution. | |
4/6/2020 | Duplicate CBGs with Different States | Yes. Ignore State in home-panel-summary and aggregate within CBG. Product fix coming soon. | |
4/2/2020 | Problem with IOWA CBG 190570010001 | Yes. This CBG has been removed from SDM. | |
3/1/2020 | 2/25/2020 Artifact (affecting SDM and Patterns) | See this summary of known unexpected data trends for 2/25 |
Updated over 2 years ago
What’s Next