July-2020 Release Notes

With the first half of 2020 in the rear-view mirror, we are happy to bring you the July-2020 release notes πŸ“. Take a peek at what we've been working on so far this summer. πŸ‘€ (2020-06-27/1593252164 shipped 2020-07-06).

Highlights

  • Improved sourcing nets +29k park POIs 🌳 🌲
  • New parent candidates added to spatial hierarchy :diamond-shape:
  • Better clarity on visits to medical facilities πŸ₯

Table of Contents:

Enhancements - Core Places and Brands

  • Last month SG Places had 5,968,092 points-of-interest. This month SG Places has 6,004,597 points-of-interest (net +36,505 places). These are +36,693 US Places and -188 CA places.

  • We've added +31 new brands including +13 Offices of Physicians (except Mental Health Specialists) πŸ‘¨β€βš•οΈ πŸ‘©β€βš•οΈ
    New Brands Include...

    • Canadian Tire (canadiantire.ca), SG_BRAND_1f8108ad0bba4b95) with 0 US and 505 CA places.
    • Costco Gasoline (costco.com/gasoline.html), SG_BRAND_e1d5079f68018134), parent brand: (Costco Wholesale Corp., SG_BRAND_60b7b54d19fca719281e76d485a141ad) with 960 US and 0 CA places.
    • Case IH (caseih.com), SG_BRAND_8e0d380ebcdc0a3f) with 385 US and 110 CA places.
    • AdventHealth (adventhealth.com), SG_BRAND_a70de5c35e77fdc0) with 319 US and 0 CA places.
    • Piedmont Healthcare (piedmont.org), SG_BRAND_956459e2bc3a0d1f) with 221 US and 0 CA places.
    • and 26 more!

Bug Fixes and Known Issues - Core Places and Brands

  • We discovered a few brand count fluctuations as a result of store closures and other metadata bugs. These corrections resulted in significant changes in the total number of POI for each affected brand, but the new count is correct. For transparency, we'd like to list some of these corrections as examples in no particular order:

    • Papyrus (SG_BRAND_b276db333537c66ca679a08db0547130). Net POI count change: US: -165 CA: -19. Bug: Actually closed all stores at end of Jan 2020; now marked as closed.
    • Golfsmith (SG_BRAND_e8cd1a01f5da6188a17e771172050e79). Net POI count change: US: -66 CA: 0. Bug: Closed a while ago (end 2016). Stores that remained open were converted to Dick's but recently moved under Golf Galaxy (for which we already have brand).
    • Watermill Express (SG_BRAND_1b29fd93a75dc1c2e8f02ea05220581b). Net POI count change: US: +675 CA: 0. Bug: Last month we dropped POI due to geometry issues; these have been added back.
    • Kroger (SG_BRAND_1f852a23da4b7250). Net POI count change: US: -608 CA: 0. Bug: Inadvertent drop in POIs (will be added back next month).

Enhancements - Categories

  • As requested by our customers, we made a focused effort to improve our parks data this past month. Not only did this yield thousand of net new parks, but it also significantly improved the accuracy of our existing parks data. The top 3 count increases by category are as follows:

  • Nature Parks and Other Similar Institutions (712190). Net POI count change: US: + 29,373 CA: +12 🌳 🌲

  • Nursing Care Facilities (Skilled Nursing Facilities) (623110). Net POI count change: US: +14,400 CA: +0. πŸ‘©β€βš•οΈ πŸ‘¨β€βš•οΈ

  • General Medical and Surgical Hospitals (622110). Net POI count change: US: +4,487 CA: +0. πŸ₯

  • Category Fill Rate -- We monitor category fill rate with 3 metrics: (1) category fill rate across the entire dataset, (2) category fill rate for branded POI, (3) category fill rate in the brand_info file (brand-level categories). We want all of these numbers to be 100%.

    • (1) All POI category fill rate. Last month 98.8%. This month 98.8%.
    • (2) Branded POI category fill rate. Last month 100%. This month 100% πŸ’―
    • (3) Brand-level category fill rate (brand_info file). Last month 100%. This month 100% πŸ’―

Drops ⬇️

  • We constantly ingest data from new sources, and many sgpids are intentionally dropped, but we are unable to track each and every dropped sgpid. In this release:

    • We dropped 34,239 sgpids (16,586 branded and 17,653 non-branded).
    • 891 dropped as a result of bug fixes for branded POIs πŸ›
    • 9,108 dropped as a result of deduplication πŸ‘―β€β™‚οΈ
  • The remaining drops are undesired failures to maintain a consistent safegraph_place_id (sgpid) between releases - known as bad sgpid churn (see discussion in March 2019 release). We are continuing to work on better metrics to distinguish good vs. bad churn.

  • Learn more about Core Places files that include closed POIs here.

Enhancements - Geometry

  • In Geometry land, we focused on curating more precise polygons for branded POIs and expanded our spatial hierarchy to recognize +4 naics_codes as parent POI candidates: Family Planning Centers (621410), All Other Outpatient Care Centers (621498), Freestanding Ambulatory Surgical and Emergency Centers (621493), and Kidney Dialysis Centers (621492) - see the Places Manual for more on how we're thinking about hierarchy.

    • This expansion resulted in +43,471 POIs that are children and +12,766 POIs that are parents. πŸ‘ͺ πŸ’―
  • Percent polygon_class = OWNED (as described in Oct 2019 release notes). We examine polygon_class for all safegraph_place_id that are both (i) branded and (ii) do NOT have a parent_safegraph_place_id; we call this group "branded, no-parent". We want 100% of "branded, no-parent" POI to have polygon_class = OWNED_POLYGON.

  • Last month, the percent OWNED polygons for branded, no-parent POIs was 82.6%

  • This month it is 83.0% πŸ“ˆ

Bug Fixes and Known Issues - Geometry

  • Centroid-Radius Polygons -- As discussed in March 2019 release notes. We internally track centroid-radius polygons vs precise polygons and strive for 100% precise polygons. You can measure this yourself using the is_synthetic column.
    • This release, we saw a slight increase to 94.7% precise polygons (94.6% last month) πŸ“ˆ
    • Here is how we are tracking on this metric across releases: Centroid-Radius Polygon Tracking.

Enhancements - Patterns

  • Medical POI Visits: We rely on Geometry metadata, like spatial hierarchy, to guide visit attribution when building Patterns. We intentionally exclude visits to indoor POIs that are completely enclosed by large structures because the accuracy of mobile GPS deteriorates within these major structures (more in Places Manual). To provide better clarity on total visits to medical centers, we now exclude children of General Medical and Surgical Hospitals (622110) from receiving visits as well as children of the 4 outpatient care naics_codes added to spatial hierarchy (see geometry enhancements above). The total visits to these POIs are only assigned to the parent rather than distributed across the children. πŸ₯

  • We updated the IP ranges used to determine carrier_name, and this resulted in a higher correlation with known market shares for major carriers. See the Places Manual for a detailed breakdown.

  • We continue to see the slow reopening of the economy reflected in Patterns. In last month's delivery, SG Patterns had 4,100,749 points-of-interest (US only). This month, SG Patterns has 4,184,077 points-of-interest (US only) (net +83,328).

  • Last month, SG Patterns had 662,274,677 visits from 29,934,212 visitors. This month, SG Patterns has 798,171,658 visits from 32,021,070 visitors (delta + 135,896,981 visits, + 2,086,858 visitors). πŸ“ˆ




**In case you missed it,** check out [last month's release notes](https://docs.safegraph.com/changelog/june-2020-release-notes). πŸ“

**Calculating Diffs**
Curious to find the specific records that were either **added, deleted, or saw an attribute change** from one release to the next? Visit "Calculating Diffs" in our [Data Science Resources](https://docs.safegraph.com/docs/data-science-resources#section-calculating-diffs) to get started. 


**Also check out these new ways to get SafeGraph data: **
  * Need some extra data or other SafeGraph products? Check out the [SafeGraph Data Bar.](https://shop.safegraph.com/) 
  * Heavy AWS User?  Check out our [listings in the AWS Data Exchange](https://aws.amazon.com/marketplace/search/results?filters=vendor_id&vendor_id=7d5ff8ca-105f-4856-9d99-5f2f1d83223c).
  * Are you an Esri or ArcGIS user? Check out our FREE data [SafeGraph Places in the Esri Marketplace](https://marketplace.arcgis.com/listing.html?id=3425348e4bee4059af2b353e52df43c2) and enjoy [SafeGraph Places in Esri Basemaps](https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/mapping/new-places-in-esri-vector-basemaps/). 
  * Snowflake user? Check out our page on the [Snowflake Data Exchange](https://www.snowflake.com/datasets/safegraph/) :snowflake: 
  * Or just drop us a line! Your data needs are our data delights!