August-2020 Release Notes

We kept our foot on the gas all July 🚘 πŸ’¨ and made some significant product quality improvements. Welcome to the August-2020 release notes πŸ“ (2020-07-28/1595960589 shipped 2020-08-05).

Highlights

  • Courtesy of SafeGraph machine learning, the updated category model is now live! 🀩
  • Recognized hundreds of SMBs as SafeGraph Brands πŸ’―
  • Improved city name precision πŸŒ† 🎯
  • Daily panel fluctuations are now broken down by state 🎊

Table of Contents:

Enhancements - Core Places and Brands

  • Last month SG Places had 6,004,597 points-of-interest. This month SG Places has 5,896,574 points-of-interest (net -108,023 places). These are -53,873 US Places and -54,150 CA places.

  • We've added +146 new brands (that's not a typo!) including +37 Full-Service Restaurants 🍴
    New Brands Include...

    • Family Food Stores (familyfoodsstores.com), SG_BRAND_3356a85932854382) with 55 US and 0 CA places.
    • Slapfish (slapfishrestaurant.com), SG_BRAND_4d5017439d81e956) with 22 US and 0 CA places.
    • Tapout Fitness (tapoutfitnessnyc.com), SG_BRAND_3059716d0a91e0e1) with 48 US and 0 CA places.
    • Banner Bank (bannerbank.com), SG_BRAND_4d186eeb85f69e) with 192 US and 0 CA places.
    • Sonesta Hotels & Resorts (sonesta.com), SG_BRAND_3e92180fd0762dea) with 114 US and 0 CA places. sonesta.com
    • and 141 more!

Bug Fixes and Known Issues - Core Places and Brands

  • We discovered a few brand count fluctuations as a result of updated sourcing and other metadata bugs. These corrections resulted in significant changes in the total number of POI for each affected brand, but the new count is correct. For transparency, we'd like to list some of these corrections as examples in no particular order:

    • Western Union (SG_BRAND_9ee39f394d21a7f4848ab78a78da00c3). Net POI count change: US: +61,490 CA: 0. Bug: Dramatically improved coverage from an updated source. πŸ’°
    • MAC Cosmetics (SG_BRAND_58e2e2ddc4008302e9cdd190f7f7f1e8). Net POI count change: US: -786 CA: -77. Bug: Included Department Stores; now just their stand-alone stores
    • Kroger (SG_BRAND_1f852a23da4b7250). Net POI count change: US: 609 CA: 0. Bug: Accidentally churned locations in the previous release; they are now added back.
    • Zales (SG_BRAND_1387c4b21a20509d). Net POI count change: US: 383 CA: 0. Bug: Improved coverage from an updated source.
    • Extra Space Storage (SG_BRAND_25f99d0cc5d6078042c8f466f6a8fa83). Net POI count change: US: 902 CA: 0. Bug: Improved coverage from an updated source.

Enhancements - Categories

  • Four score and seven years ago, we embarked on a journey to improve the accuracy of our naics_code assignments for non-branded POIs. Today, we are happy to announce the results of this endeavor. A diversified set of training data combined with some tweaks to the logic yielded 1.1M naics_code changes across existing POIs. While the new category model is not perfect, we are confident that the vast majority of these changes are improvements. Below are some other relevant stats about the category model update:

    • 490k/1.1M naics_code changes share the same first two digits
    • 378k/1.1M naics_code changes share the same first three digits
      • This indicates better clarity within naics_code families πŸ‘ͺ. For example, the naics_code with the most POIs flipping to another distinct naics_code is Full-Service Restaurants ▢️ Limited-Service Restaurants. See here for a complete old/new mapping of naics_code changes from the July-2020 to the August-2020 release.
  • See the Summary Statistics tab for a breakdown of POI count by naics_code

  • In efforts to assign a best fitting naics_codes for each POI, we are consolidating 20 six-digit naics_codes into 9 four-digit naics_codes where the 6 digit naics_code is too obscure. In these cases, the sub_category column will be null. See the Places Manual for a complete mapping of these consolidations.

  • Category Fill Rate -- We monitor category fill rate with 3 metrics: (1) category fill rate across the entire dataset, (2) category fill rate for branded POI, (3) category fill rate in the brand_info file (brand-level categories). We want all of these numbers to be 100%.

    • (1) All POI category fill rate. Last month 98.8%. This month 98.8%.
    • (2) Branded POI category fill rate. Last month 100%. This month 100% πŸ’―
    • (3) Brand-level category fill rate (brand_info file). Last month 100%. This month 100% πŸ’―

Drops ⬇️

  • We constantly ingest data from new sources, and many sgpids are intentionally dropped, but we are unable to track each and every dropped sgpid. In this release:

    • We dropped 284,561 sgpids (20,263 branded and 264,298 non-branded).
    • ~177k dropped because the updated naics_code no longer fits the current scope of a SafeGraph Place. In other words, the new category model enhanced our ability to find and empty the trash. πŸ—‘
    • 1,157 dropped as a result of bug fixes for branded POIs πŸ›
    • 30,085 dropped as a result of deduplication πŸ‘―β€β™‚οΈ
    • 49,346 dropped due to permanent closures ❌
  • The remaining drops are undesired failures to maintain a consistent safegraph_place_id (sgpid) between releases - known as bad sgpid churn (see discussion in March 2019 release). We are continuing to work on better metrics to distinguish good vs. bad churn.

  • Learn more about Core Places files that include closed POIs here.

Enhancements - Geometry

  • For those of you who made us aware of our city name errors, thank you - we heard you loud and clear! πŸ“’ With your help, we discovered thousands of POI centroids (latitude & longitude) that did not reside within the city that was populated in the city column. In some cases, we chose the neighboring city, and in other cases, the error was much more egregious. To correct this, we now reference all centroids against a geospatial city boundary as defined by the U.S. census (browse the boundaries here). In edge cases, the preferred city name in the address line reflects a pre-annexed city name, and it will be a work in progress to re-correct these edge cases. πŸŒ‡

  • Percent polygon_class = OWNED (as described in Oct 2019 release notes). We examine polygon_class for all safegraph_place_id that are both (i) branded and (ii) do NOT have a parent_safegraph_place_id; we call this group "branded, no-parent". We want 100% of "branded, no-parent" POI to have polygon_class = OWNED_POLYGON.

  • Last month, the percent OWNED polygons for branded, no-parent POIs was 83.0%

  • This month it is 80.2% πŸ“‰

Bug Fixes and Known Issues - Geometry

  • Centroid-Radius Polygons -- As discussed in March 2019 release notes. We internally track centroid-radius polygons vs precise polygons and strive for 100% precise polygons. You can measure this yourself using the is_synthetic column.
    • This release, we improved to 95.5% precise polygons (94.7% last month) πŸ“ˆ ‼️
    • Here is how we are tracking on this metric across releases: Centroid-Radius Polygon Tracking.

Enhancements - Patterns

  • Normalization Stats for Patterns are now broken down by state to enable normalization by region. πŸ’― πŸ”₯

  • The visitor_work_cbgs column is officially deprecated and will show a value of "{}" across the board. Please reference the visitor_daytime_cbgs column to glean common daytime locations. β˜€οΈ ⏰

  • We continue to see the slow reopening of the economy reflected in Patterns. In last month's delivery, SG Patterns had 4,184,077 points-of-interest (US only). This month, SG Patterns has 4,091,868 points-of-interest (US only) (net -92,209). The slight decrease in the total number of POIs in Patterns can be explained by the removal of junky POIs from Places (see category model updates above).

  • Last month, SG Patterns had 798,171,658 visits from 32,021,070 visitors. This month, SG Patterns has 818,376,187 visits from 32,786,255 visitors (delta +20,204,529 visits, + 765,185 visitors). πŸ“ˆ


**What do a fire hydrant, a Gucci store, an apartment complex, and a tree in a national park have in common?** They all have a Placekey πŸ”‘ . [Learn more about relating the unrelated here](https://placekey.io/launch).

**In case you missed it,** check out [last month's release notes](https://docs.safegraph.com/changelog/july-2020-release-notes). πŸ“

**Calculating Diffs**
Curious to find the specific records that were either **added, deleted, or saw an attribute change** from one release to the next? Visit "Calculating Diffs" in our [Data Science Resources](https://docs.safegraph.com/docs/data-science-resources#section-calculating-diffs) to get started. 


**Also check out these new ways to get SafeGraph data: **
  * Need some extra data or other SafeGraph products? Check out the [SafeGraph Data Bar.](https://shop.safegraph.com/) 
  * Heavy AWS User?  Check out our [listings in the AWS Data Exchange](https://aws.amazon.com/marketplace/search/results?filters=vendor_id&vendor_id=7d5ff8ca-105f-4856-9d99-5f2f1d83223c).
  * Are you an Esri or ArcGIS user? Check out our FREE data [SafeGraph Places in the Esri Marketplace](https://marketplace.arcgis.com/listing.html?id=3425348e4bee4059af2b353e52df43c2) and enjoy [SafeGraph Places in Esri Basemaps](https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/mapping/new-places-in-esri-vector-basemaps/). 
  * Snowflake user? Check out our page on the [Snowflake Data Exchange](https://www.snowflake.com/datasets/safegraph/) :snowflake: 
  * Or just drop us a line! Your data needs are our data delights!