Known Issues or Data Artifacts

Known Data Issues or Artifacts

We strive for full transparency for any known data issues that may affect your analysis, not otherwise accounted for in monthly release notes. If you notice a problem that is not listed here, please send us your observations so we can investigate.

Year to Date Known Issues

Date Reported

Description

Discussion/Links

Resolved?

5/13/2022

In the May 11 delivery of Weekly Patterns (for the week starting 5/2), visits are declining about ~15-20% due to shifts in the panel and processing changes introduced at the beginning of May.

The drop in visits is larger than the drop in panel size (i.e., total_devices_seen only drops ~10%), and this primarily affects Patterns data in the US only (Canada is far less affected).

This is a one-time change introduced due to a change in supply plus processing changes to improve privacy protection.

Daily visit trends are the same for the majority of POIs, but this will impact longitudinal studies dependent on precise computation of relative change (particularly changes <20%).

We recommend using one of the pre-normalized columns if your longitudinal use case is impacted.

In particular, the normalized_visits_by_total_visits column is most consistent through this transition.

5/4/2022

In the May 4 delivery of Weekly Patterns (for the week starting 4/25), the home_panel_summary supplemental file contains extra rows for certain US CBGs which have mis-assigned iso_country_code="CA".

This only affects CBGs which are very close to the border, and is expected to be a minor, mostly cosmetic, issue. If this affects your analysis, please let us know.

This will be fixed in the following release.

5/3/2022

We noticed a bug in the visitor_home_aggregation column in August 2021 Monthly Patterns file. Census Block Groups were used instead of census tracts only for that month.

This was the result of a bug fix which was not fully merged in until after the release and which, unfortunately, did not go noticed until now. We recommend summing up visitors to each tract using the CBG data (noting that the tract is the first 11 digits of the CBG) as a stopgap measure until resolution.

This will be resolved in the next backfill of Patterns data.

Issues Reported in 2021

Date Reported

Description

Discussion/Links

Resolved?

12/15/2021

There was a single-day drop in overall POI visits in SafeGraph Patterns on 12/7/21 resulting from lost device data due to the AWS outage that day. The drop is on the order of 15% for all of US.

Customers using weekly or monthly aggregated columns should see minimal impact, but users looking at daily un-normalized data should use caution when interpreting visits for that day.

This incident will not be resolved because the data are unrecoverable. Please reach out to SafeGraph if you have any questions about how to adjust for this behavior in the data.

11/29/2021

There was an increase in the number devices in SafeGraph's Patterns panel starting 11/15/2021, with the largest increases in WA state (+20%). In the raw, non-normalized data, this appears as an increase in visitors.

We recommend applying panel normalization to account for the changes in panel size if your application is sensitive to visits during this time.

Not necessary since this is normal behavior.

11/18/2021

A few parks in Seattle have highly anomalous visits in the July 2021 Backfill of Monthly and Weekly Patterns during a few historical weeks.

This was confirmed to be isolated to just a few POIs and not to be due to Geometry errors. See discussion in Community here.

No, and unlikely to be. Most likely the cause is a temporary sink of anonymized lat/long pings in these locations that slipped through QA. Our recommendation is to remove these outliers and impute the missing weeks.

9/14/2021

A bug caused visits to large Golf Courses and Country Clubs (naics_code=713910) and Amusement and Theme Parks (naics_code=713110) to be inflated between April and August 2021, inclusive, including data in the July 2021 backfill. Median dwell time was similarly lower than it should have been for the same months due to this bug.

Note that visitors were not affected, just visits.

This affected ~17k POIs in Patterns (0.3%).

The bug primarily affected "large" POIs, in this case POIs in the two categories over a certain square footage, although not all such POIs were impacted.

Visits and median dwell for these POIs were corrected for September 2021 data and onward.

9/7/2021

In Neighborhood Patterns from Jan 2018 to June 2021, there are 1-2 days per month where the stops_by_day column does not match the sum of the relevant elements in the stops_by_each_hour array.

A list of affected dates can be found here.

Yes, as of the July 2021 Neighborhood Patterns release. Historical data will be corrected in the next backfill for Neighborhood Patterns.

8/12/2021

There is a CBG in Manhattan around City Hall that indicates 10x as many devices in Neighborhood Patterns as neighboring CBGs

See this Community slack thread

No, and resolution will be unlikely owing to the fact that sources/sinks are sometimes inherent in GPS data. If this is affecting normalization, we recommend using normalizing using state values as opposed to CBG values.

8/6/2021

Close to 6000 POIs (~0.1%) have visits assigned in Patterns after closed_on dates .

See Relationship with opening and closing dates

This issue gets resolved with each backfill.

7/29/2021

Neighborhood Patterns Home Panel Summary files have a small number of rows corresponding to Canadian neighborhoods.

No, but when Canada Neighborhood Patterns gets released, the Home Panel Summary files will have many more rows for Canadian neighborhoods, so this behavior will become standard in the future.

7/7/2021

Quotation marks in iso_country_codes_openand iso_country_codes_closed columns in Brand Info file are not encoded properly.

Yes. This was resolved in the August 2021 release of Places.

7/6/2021

safegraph_place_id and parent_safegraph_place_id columns dropped as of July-2021 Release Notes. safegraph_place_id and parent_safegraph_place_id were dropped and placekey and parent_placekey are referenced moving forward.

N/A

7/6/2021

tracking_opened_since column dropped as of July-2021 Release Notes. It was providing redundant information. If a POI has an opened_on value, it implies we've been tracking it since that date. If a POI does not have an opened_on value, it implies we were not able to track the exact date it opened.

N/A

7/6/2021

June 2021 version of Monthly Patterns appears to have large sinks of devices in a few CBGs, far greater than the population of those CBGs.

No, but resolution will be unlikely owing to the fact that sources/sinks are sometimes inherent in GPS data. If this is affecting normalization, we recommend using normalizing using state values as opposed to CBG values.

6/4/2021

Due to incorrect geometries, 6 U.S. POIs have a Canadian geocodes, leading to some odd behavior in Supplementary Files.

Will be as soon as fixes to these geometries get ingested.

3/17/2021

Processing error in Social Distancing Metrics on 3/8/2021 which resulted in an influx in devices on this day. This explains the sharp increase in devices seen and completely home devices on this date.

No

3/17/2021

Processing error in Weekly Patterns on 3/3 caused a decrease in visits. We backfilled the week of starting 3/1 to fix this.

Community Thread

Yes

1/12/2021

Certain columns in Neighborhood Patterns columns were lower than expected.

Community Thread

Yes, in the July 2021 Backfill.

Prior to 2021 Known Issues

Date Reported

Description

Discussion/Links

Resolved?

11/18/2020

In Social Distancing Metrics (and possibly other datasets) there are an abnormal number of records showing travel to/from parts of Kansas. This is likely due to a GPS data problem related to the the center of the country issue known to influence a very small minority of location data when non-GPS data is inadvertently mixed with GPS data.

See this summary of known unexpected data trends for 2/25

SafeGraph is always working to ensure the highest quality location data is used to build its products and we are always working to improve artifacts like this one.

8/30/2020

4/21/2019 (Easter) may be an anomalous day in Patterns data.

We had a supply issue at that time that seemed to have decreased the number of visits collected artificially.

Actively investigating. Workaround is to ignore data from this day.

7/7/2020

Several inexplicably abnormal days of data in 2018. Dates affected: 3/15/2018, 9/15/2018, 9/16/2018

Community Discussion

No fix in medium term. Short term workaround is to omit completely if possible. Otherwise, replace with median imputation or some other method so the days have no impact on analysis.

6/30/2020

opened_on column over-indexed on 2020-01

See opened_on documentation

4/13/2020

CBG FIPS are corrupted for some rows in Open Census Data file cbg_b22.csv

Unfortunately, there is no timeline for fixing this. Apologies for the inconvenience. However, our Slack Community members can see Jonas Peeters solution.

4/6/2020

Duplicate CBGs with Different States

Yes. Ignore State in home-panel-summary and aggregate within CBG. Product fix coming soon.

4/2/2020

Problem with IOWA CBG 190570010001

Yes. This CBG has been removed from SDM.

3/1/2020

2/25/2020 Artifact (affecting SDM and Patterns)

See this summary of known unexpected data trends for 2/25


What’s Next