December-2020 Release Notes

If only we could deliver our data with virtual wrapping paper ๐ŸŽ. Happy Holidays from everyone at SafeGraph, and welcome to the December 2020 release notes (2020-11-13/1605265391 shipped 2020-12-04). ๐Ÿ•ฏ ๐ŸŽ„โญ๏ธ โ›„๏ธ

Highlights

  • +1 category added to Places ๐Ÿ˜
  • +21 brands ๐ŸŽŠ
  • +3 Patterns columns ๐Ÿšจ
  • +1 Neighborhood Patterns column ๐ŸŽŠ
  • +1 Core and Geometry column
  • Partridge in pear tree not included ๐Ÿ˜‰

Table of Contents:

Enhancements - Patterns

  • Welcome to backfill-palooza! What is a "backfill?" A SafeGraph Patterns backfill is a restatement of historical foot traffic activity using the latest version of Core and Geometry to inform the recalculation. Improving our product is a double-edged sword ๐Ÿ—ก . While enhancing our Core and Geometry products is a worthwhile cause, it can lead to foot traffic inconsistency over time if POIs experience significant metadata changes. Restating the entire history with a single version of Core and Geometry alleviates this pain for customers licensing historical data and provides a consistent view of foot traffic over time. We do this a few times each year and strategically plan schema breaking changes around backfills. See here for details about previous backfills.

  • The December 2020 "backfill" restates foot traffic activity from January 1st 2018 - present (Nov. 30th) for Weekly Patterns, Monthly Patterns, and Neighborhood Patterns. ๐Ÿ’ฅ

  • This backfill is โ€œsmarterโ€ than previous backfills and accounts for point-in-time store openings and store closures. ๐Ÿง 

  • For example, if a POI opened in January 2019, we will not attribute visits to the POI from January 2018 - December 2018 and will only attribute visits from January 2019 onward. On the other hand, if a POI closed in January 2019, we will only attribute visits from January 2018 - December 2018 and will not attribute visits from January 2019 - present.

  • We are relying on the metadata provided by our closed_on, opened_on, tracking_closed_since, and tracking_opened_since columns to make these determinations. If we do not have open/close information for a POI, we will treat the POI as โ€œopenโ€ for the duration of the backfill. See here for more about how we determine POI openings/closings.

  • Both Weekly and Monthly Patterns now include parent_safegraph_place_id and parent_placekey columns. Understanding spatial hierarchy is especially important for visit aggregation use cases to ensure visits are not โ€œdouble countedโ€ at both the child and parent POI. See Visits to Parent POIs for details.

  • We've added +2 bins to bucketed_dwell_times for increased granularity! ๐Ÿ”ฌ

  • Home Panel Summary ("home_panel_summary.csv") in Weekly Patterns and Monthly patterns now only includes those devices whose homes are eligible to be counted in the visitor_home_cbgs column. See here for methodology on calculating homes. In the past, the counts in the Home Panel Summary included any device which had at least one visit during the given time period and for which we had identified a primary nighttime geohash with any degree of confidence. Meanwhile visitor_home_cbgs only included devices for which we had identified a primary nighttime geohash with a high degree of confidence. With this change, we are aligning our requirements for the visitor_home_cbgs column with the Home Panel Summary. This means the counts in the Home Panel Summary will be lower than they had been in the past but the Home Panel Summary and visitor_home_cbgs methodologies are now consistent which had been the original intent to help with normalization.

  • Weekly Patterns now includes a โ€œvisit_panel_summary.csvโ€ file (like Monthly Patterns) to show the total number of visits and visitors to POIs in a given week by state.

  • The visitor_work_cbgs column is now removed from the schema. This column has been empty since August 2020 in favor of visitor_daytime_cbgs.

  • carrier_name has been added to weekly patterns as a premium column option. ๐Ÿ’ฅ

  • In last month's delivery, SG Patterns had 4,186,911 points-of-interest (US only). This month, SG Patterns has 4,393,086 points-of-interest (US only) (net + 206,175). ๐Ÿ“ˆ

  • Last month, SG Patterns had 899,790,036 visits from 34,787,736 visitors. This month, SG Patterns has 778,401,379 visits from 35,531,210 visitors (delta -121,388,657 visits, + 743,474 visitors).

Enhancements - Neighborhood Patterns

  • Neighborhood Patterns, which assigns foot traffic counts to census block groups instead of places, includes a new column, work_behavior_device_home_areas, to support commuting use cases. This column shows the device count per home census block group for devices which dwelled in a target census block group during the workday and for more than 6 hours ๐Ÿš— ๐Ÿ’ผ .

Enhancements - Core Places and Brands

  • parent_placekey is a new column listing the placekey of the parent (containing) POI if the place is contained by a parent entity (e.g. mall, airport, stadium). This mirrors the existing relationship between safegraph_place_id / parent_safegraph_place_id. See the Core Places schema for details.

  • Last month SG Places had 6,654,168 points-of-interest (including closed POIs). This month SG Places has 6,874,427 points-of-interest (net + 220,259 places). These are +217,353 US Places and +2,906 CA places.

  • We've added +21 brands including +7 Gasoline Stations with Convenience Stores โ›ฝ๏ธ and +5 Blood and Organ Banks (621991) ๐Ÿฉธ ๐Ÿง›โ€โ™‚๏ธ
    New Brands Include...

    • Sobeys ((sobeys.com), SG_BRAND_43deb5f906f8064a) with 0 US and 255 CA places.
    • Hy-Vee Gas Station ((hy-vee.com/stores/gas-finder), SG_BRAND_58cd8c154f664b5c), parent brand: (Hy-Vee, SG_BRAND_8f8c9465b9550499b0540b26e9470dec) with 167 US and 0 CA places.
    • Christmas Tree Shops ((christmastreeshops.com), SG_BRAND_e2ed21c3015b3571f677c10f987a7ccb), parent brand: (Bed Bath & Beyond, SG_BRAND_6e7bcf9086fc3b43babdfdf51a97759f) with 81 US and 0 CA places.
    • Interstate Blood Bank ((interstatebloodbank.com), SG_BRAND_bf7dfe9485d088ab), parent brand: (GRIFOLS, SG_BRAND_3bb370e5a4a93cf0) with 36 US and 0 CA places.
    • Suzuki ((suzuki.com), SG_BRAND_101186a9a44bc0354ed997696a6aefba) with 5781 US and 0 CA places.
    • and 16 more!

Bug Fixes and Known Issues - Core Places and Brands

  • We discovered a few brand count fluctuations as a result of updated sourcing and other metadata bugs. These corrections resulted in significant changes in the total number of POIs for each affected brand, but the new count is correct. For transparency, we'd like to list some of these corrections as examples in no particular order:

    • Castle Dental (SG_BRAND_473e6c6e77f4c292). Net POI count change: US: -288 CA: 0. Bug: Previously included affiliates.
    • T-Mobile (SG_BRAND_4b82356db1a8f4a2db26dd5b7e30abba). Net POI count change: US: 2507 CA: 0. Bug: Stores are now under the T-Mobile brand but still contain "Sprint" in the location_name.

Enhancements - Categories

  • With guidance from our customers, we continue to expand the definition of a SafeGraph "place." We're happy to announce that the December 2020 release features 1,761 corporate office locations across 65 of the Fortune 100 companies ๐Ÿข. These POIs can be found by searching for naics_codes = 551114 (Corporate, Subsidiary, and Regional Managing Offices).

Category Fill Rate -- We monitor category fill rate with 3 metrics: (1) category fill rate across the entire dataset, (2) category fill rate for branded POI, (3) category fill rate in the brand_info file (brand-level categories). We want all of these numbers to be 100%.

  • (1) All POI category fill rate. Last month 99.2%. This month 99.2%.
  • (2) Branded POI category fill rate. Last month 100%. This month 100% ๐Ÿ’ฏ
  • (3) Brand-level category fill rate (brand_info file). Last month 100%. This month 100% ๐Ÿ’ฏ

Drops โฌ‡๏ธ

  • We constantly ingest data from new sources, and many safegraph_place_ids (sgpids) are intentionally dropped, but we are unable to track each and every dropped sgpid. For the first time, the following metrics account for closed POIs as well (see more about our open/close columns here)

    • We dropped 17,564 sgpids (6,056 branded and 11,508 non-branded).
    • ~2k dropped due to POI source fluctuations
    • ~1k dropped as a result of bug fixes for branded POIs ๐Ÿ›
    • ~8k dropped as a result of deduplication ๐Ÿ‘ฏโ€โ™‚๏ธ
  • The remaining drops are undesired failures to maintain a consistent sgpid between releases - known as bad sgpid churn (see discussion in March 2019 release). We are continuing to work on better metrics to distinguish good vs. bad churn.

  • In October, we cofounded the Placekey initiative and added placekey as a unique and persistent identifier for all POIs in the SafeGraph dataset. See here for more on how Placekey is unlocking access to geospatial data across industries.

Enhancements - Geometry

  • parent_placekey is a new column listing the placekey of the parent (containing) POI if the place is contained by a parent entity (e.g. mall, airport, stadium). This mirrors the existing relationship between safegraph_place_id / parent_safegraph_place_id. See the Geometry schema for details.

  • This month in Geometry world, we upgraded some of our "parent" polygons for naics_codes including but not limited to theme parks, golf courses, ski resorts, and airports โœˆ๏ธ ๐ŸŽฟ โ›ณ๏ธ ๐ŸŽข . This resulted in more uniform polygons across these important categories and provides consistency for the Patterns "backfill."

  • While OWNED polygons are preferred, it does not mean that SHARED polygons are inherently bad. It only means that the exact shape of each POI within the polygon is not discernible, but the general location can be identified by the centroid (latitude & longitude). ๐ŸŽฏ

  • When enclosed = FALSE, it indicates that there are reasonable means to derive a unique polygon for the POI (even when parent_safegraph_place_id is not null), and we strive for 100% of branded, non-enclosed POIs to have polygon_class = "OWNED_POLYGON."

  • Last month, the percent OWNED polygons for branded, non-enclosed POIs was 79.5%

  • This month, the percent OWNED polygons for branded, non-enclosed POIs is 78.0% ๐Ÿ“‰

Bug Fixes and Known Issues - Geometry


**In case you missed it,** check out [last month's release notes](https://docs.safegraph.com/changelog/november-2020-release-notes). ๐Ÿ“

**Calculating Diffs**
Curious to find the specific records that were either **added, deleted, or saw an attribute change** from one release to the next? Visit "Calculating Diffs" in our [Data Science Resources](https://docs.safegraph.com/docs/data-science-resources#section-calculating-diffs) to get started. 

**Fill Rates**
See the [Summary Statistics](https://docs.safegraph.com/docs/places-summary-statistics) page for all Core and Geometry column fill rates as well as a breakdown of POI count by `naics_code`.

**Explore**
Browse SafeGraph Core & Geometry data at your own pace [in these webmaps.](https://storymaps.arcgis.com/stories/8e5e066486f94f0ea698e507d46987f7)

**Also check out these new ways to get SafeGraph data: **
  * Need some extra data or other SafeGraph products? Check out the [SafeGraph Data Bar.](https://shop.safegraph.com/) 
  * Heavy AWS User?  Check out our [listings in the AWS Data Exchange](https://aws.amazon.com/marketplace/search/results?filters=vendor_id&vendor_id=7d5ff8ca-105f-4856-9d99-5f2f1d83223c).
  * Are you an Esri or ArcGIS user? Check out our FREE data [SafeGraph Places in the Esri Marketplace](https://marketplace.arcgis.com/listing.html?id=3425348e4bee4059af2b353e52df43c2) and enjoy [SafeGraph Places in Esri Basemaps](https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/mapping/new-places-in-esri-vector-basemaps/). 
  * Snowflake user? Check out our page on the [Snowflake Data Exchange](https://www.snowflake.com/datasets/safegraph/) :snowflake: 
  * Or just drop us a line! Your data needs are our data delights!