June-2020 Release Notes

Welcome to the June-2020 release notes 📝 - we're excited to share what we've been up to this past month (2020-05-31/1590919942 shipped 2020-06-05).

Highlights

  • +39 brands with an emphasis on Gas Stations ⛽️
  • Enhanced methodology for determining visitor_home_cbgs
  • Continued geometry improvements with a focus on shopping malls 👜

Table of Contents:

Enhancements - Core Places and Brands

  • Last month SG Places had 5,929,331 points-of-interest. This month SG Places has 5,968,092 points-of-interest (net +38,761 places). These are +36,520 US Places and +2,241 CA places.

  • We've added +39 new brands including +6 Canadian brands 🍁 and +5 sub brands of existing parent brands 🎊
    New Brands Include...

    • Gobble Stop (gobblestop.com), SG_BRAND_7e07590850f6a09f), parent brand: (Marathon, SG_BRAND_faaaac9cb18c500a97c03eec92d6b8fc) with 12 US and 0 CA places.
    • Esso (esso.ca), SG_BRAND_72fd44e857509cdd), parent brand: (Exxon Mobil, SG_BRAND_a144a8c10e1fe8006125571afd1a1e80) with 0 US and 1874 CA places.
    • Jubilee Food Stores (hillcityoil.net/jubilee), SG_BRAND_86a5c003c7226385), parent brand: (Hill City Oil, SG_BRAND_4746288f030e6fcd) with 18 US and 0 CA places.
    • Kent Kwik (kentkwik.com), SG_BRAND_3bdcdce50c1ac094), parent brand: (The Kent Companies, SG_BRAND_445fc19dcd78b55e) with 46 US and 0 CA places.
    • Suburban Extended Stay (suburbanhotels.com), SG_BRAND_8956cee4e5d00996ab4e920c4b33034b), parent brand: (Choice Hotels, SG_BRAND_43d104f5bf19b83c) with 61 US and 0 CA places.
    • Amtrak (amtrak.com), SG_BRAND_8600ba93ae70c64076266f8c3f3aec18) with 1052 US and 15 CA places.
    • OK Tire Stores (oktire.com), SG_BRAND_2db9cc48ee5e27be) with 0 US and 317 CA places.
    • and 32 more!

Bug Fixes and Known Issues - Core Places and Brands

  • We discovered a few brand count fluctuations as a result of brand consolidations and store closures. These corrections resulted in significant changes in the total number of POI for each affected brand, but the new count is correct. For transparency, we'd like to list some of these corrections as examples in no particular order:

    • St. Vincent (SG_BRAND_5586a61b0f10eb13). Net POI count change: US: -348 CA: 0. Bug: Redirects to healthcare.acesnsion.org (for which we already have as a Brand)
    • Lube Stop (SG_BRAND_bbdcd2af854df91c7fe3b9060f7e2a1f). Net POI count change: US: -4 CA: 0. Bug: Redirects to take5oil.change.com (for which we already have brand)
    • Specialty's Cafe and Bakery (SG_BRAND_c31b13b933c192b3c74ba472dd6112fc). Net POI count change: US: -45 CA: 0. Bug: Announced closed all stores for good in May.
  • Fit4Mom (SG_BRAND_20a7a33cd9d81eca). Net POI count change: US: -1534 CA: 0. Bug: Removed this brand; these weren't actually "Fit4Mom" branded locations; they were meet-up locations (e.g., at Parks).

Enhancements - Categories

  • Better training data means better accuracy for high profile categories like Full-Service Restaurants, Gas Stations, and Commercial Banking 🎊.

  • Full-Service Restaurants (722511). Net POI count change: US: +2,972 CA: +1,294 🍴

  • Gasoline Stations with Convenience Stores (447110). Net POI count change: US: +1,114 CA: +1,772. ⛽️

  • Commercial Banking (522110). Net POI count change: US: +1,724 CA: +118. 💰

  • Category Fill Rate -- We monitor category fill rate with 3 metrics: (1) category fill rate across the entire dataset, (2) category fill rate for branded POI, (3) category fill rate in the brand_info file (brand-level categories). We want all of these numbers to be 100%.

    • (1) All POI category fill rate. Last month 98.8%. This month 98.8%.
    • (2) Branded POI category fill rate. Last month 100%. This month 100% 💯
    • (3) Brand-level category fill rate (brand_info file). Last month 100%. This month 100% 💯

Drops ⬇️

  • We constantly ingest data from new sources, and many sgpids are intentionally dropped, but we are unable to track each and every dropped sgpid. In this release:

    • We dropped 28,192 sgpids (14,305 branded and 13,887 non-branded).
    • 1,931 dropped as a result of bug fixes for branded POIs 🐛
    • 7,490 dropped as a result of deduplication 👯‍♂️
  • The remaining drops are a combination of store closures and bad sgpid churn. Bad sgpid churn are undesired failures to maintain a consistent safegraph_place_id (sgpid) between releases (see discussion in March 2019 release). We are continuing to work on better metrics to distinguish good vs. bad churn.

  • Learn more about Core Places files that include closed POIs here.

Enhancements - Geometry

  • This past month, we focused on correcting shopping mall POIs with bad geometry. This resulted in +5,299 POIs recognized as children of a parent shopping mall POI (531120) as well as improved polygons for many of the largest shopping mall POIs 🛍.

  • Percent polygon_class = OWNED (as described in Oct 2019 release notes). We examine polygon_class for all safegraph_place_id that are both (i) branded and (ii) do NOT have a parent_safegraph_place_id; we call this group "branded, no-parent". We want 100% of "branded, no-parent" POI to have polygon_class = OWNED_POLYGON.

  • Last month, the percent OWNED polygons for branded, no-parent POIs was 82.8%

  • This month it is 82.6% 📉

  • In case you missed it, check out last month's release notes for details on overlapping polygon clean-up, tweaks to spatial hierarchy, and an updated definition of polygon_class. 💯

Bug Fixes and Known Issues - Geometry

  • Centroid-Radius Polygons -- As discussed in March 2019 release notes. We internally track centroid-radius polygons vs precise polygons and strive for 100% precise polygons. You can measure this yourself using the is_synthetic column.
    • This release, we saw a slight increase to 94.6% precise polygons (94.5% last month) 📈
    • Here is how we are tracking on this metric across releases: Centroid-Radius Polygon Tracking.

Enhancements - Patterns

  • We improved our methodology for inferring visitor_home_cbgs for devices in our panel. Instead of looking at frozen six-week time periods to determine home census block groups each month, we are now updating home census block groups based on a rolling 6 week window of data 🏡.

  • We are seeing the slow reopening of the economy reflected in Patterns. In last month's delivery, SG Patterns had 4,021,111 points-of-interest (US only). This month, SG Patterns has 4,100,749 points-of-interest (US only) (net +79,638).

  • Last month, SG Patterns had 484,790,060 visits from 27,883,262 visitors. This month, SG Patterns has 662,274,677 visits from 29,934,212 visitors (delta + 177,484,617 visits, + 2,100,950 visitors). 📈

  • In case you missed it, check out last month's release notes for details on new Patterns columns, the introduction of differential privacy techniques, and more! 👏


**Calculating Diffs**
Curious to find the specific records that were either **added, deleted, or saw an attribute change** from one release to the next? Visit "Calculating Diffs" in our [Data Science Resources](https://docs.safegraph.com/docs/data-science-resources#section-calculating-diffs) to get started. 


**Also check out these new ways to get SafeGraph data: **
  * Need some extra data or other SafeGraph products? Check out the [SafeGraph Data Bar.](https://shop.safegraph.com/) 
  * Heavy AWS User?  Check out our [listings in the AWS Data Exchange](https://aws.amazon.com/marketplace/search/results?filters=vendor_id&vendor_id=7d5ff8ca-105f-4856-9d99-5f2f1d83223c).
  * Are you an Esri or ArcGIS user? Check out our FREE data [SafeGraph Places in the Esri Marketplace](https://marketplace.arcgis.com/listing.html?id=3425348e4bee4059af2b353e52df43c2) and enjoy [SafeGraph Places in Esri Basemaps](https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/mapping/new-places-in-esri-vector-basemaps/). 
  * Or just drop us a line! Your data needs are our data delights!