May-2020 Release Notes

If April showers bring May flowers, then last month was a downpour yielding a beautiful bouquet of product enhancements. Welcome to the May-2020 release notes - we have some BIG updates to share. :droplet: :sunflower: (2020-04-22/1587549704 shipped 2020-05-05)

Highlights

  • Differential privacy techniques enable increased visit detection across Patterns :chart-with-upwards-trend:
  • ๐Ÿšจ New feature alert! +3 Patterns columns (safegraph_brand_ids, poi_cbg, visitor_daytime_cbgs), +1 Patterns premium column (carrier_name), and +1 Patterns table (normalization_stats.csv)
  • Overlapping polygons cleaned up where overlap should not exist ๐Ÿงน ๐Ÿงผ
  • Geometry's spatial hierarchy and polygon classification get some love to improve consistency :heart: :100:

Table of Contents:

Enhancements - Patterns

  • Introducing 3 new columns that are now included in standard Patterns deliveries:

    • safegraph_brand_ids for easier joining across other SafeGraph products featuring brands
    • poi_cbg to understand the census block group a POI resides within compared to the cbg(s) of its visitors
    • visitor_daytime_cbgs for insight into the primary daytime locations of visitors to a POI. Previously customers used visitor_work_cbgs for these insights, and we are deprecating visitor_work_cbgs in August 2020. visitor_daytime_cbgs is a more robust and agnostic summary of device behavior.
  • A premium column, carrier_name, is available upon request. Please reach out to your SafeGraph rep for details. For more information on the new Patterns columns, reference the Places Schema.

  • How's your shorthand math converting seconds since January 1st, 1970 to actual dates? ๐Ÿค”...we suspected as much. The date_range_start and date_range_end columns are now displayed in local time with an offset from GMT. See the Patterns Schema for more details on format :date:.

  • As of the May 2020 release, we no longer exclude workers from visits to places. We have found that the bucketed_dwell_time column serves as a better proxy for identifying workers. See the Worker & Non-Worker Visits section of the Places Manual for more :clock1:.

  • Interested in recreating the data normalization we generated for Commerce Patterns? There's a table for that. Check out the brand new Normalization Stats table in the Patterns Schema, and reference this "how-to" guide to get started ๐Ÿ˜Ž.

  • Last, and very far from least, we have applied differential privacy to Patterns to enhance privacy. This enables us to remove the "less than 5 visitor rule" which was causing issues for customers trying to understand trade areas. As of the May 2020 release onward, POIs with at least 1 visitor are included in Patterns, and only two devices are required to provide data on the following columns:

    • visitor_home_cbgs
    • visitor_daytime_cbgs
    • visitor_work_cbgs
    • visitor_country_of _origin
    • device_type
    • carrier_name

Reference the Privacy section of the Places Manual to learn about our privacy practices.

  • As a result of lowering the minimum POI visitor requirement from 5 to 1, the total number of POIs with measured foot traffic increased in April 2020. In last month's delivery, SG Patterns had 3,626,236 points-of-interest (US only). This month, SG Patterns has 4,021,111 points-of-interest (US only) (net +394,875) :chart-with-upwards-trend:.
  • As expected, the total volume of visits and visitors decreased in April as stay at home orders persisted for most of the U.S. Last month, SG Patterns had 789,737,092 visits from 39,546,806 visitors. This month, SG Patterns has 484,790,060 visits from 27,833,262 visitors (delta -304,947,032 visits, -11,713,544 visitors) :chart-with-downwards-trend:.

Enhancements - Geometry

  • A major initiative to clean up redundant, overlapping polygons concluded in April, and the results affected more than 500k polygons. In some cases, we trimmed borders, and in other cases, we chose a single polygon and discarded another when two POIs mapped to two distinct, yet nearly identical polygons. The latter naturally decreased the number of POIs with polygon_class = OWNED_POLYGON, but we view this as an enhancement because one of the two POIs changing from OWNED_POLYGON to SHARED_POLYGON was previously mapped to a redundant, low value polygon despite the OWNED_POLYGON classification. Here are some stats and noticeable impacts:

    • 356,110 POIs now have a smaller polygon area :clap:
    • 155,649 POIs now have a larger polygon area
    • Net -94,277 total polygons in Places :chart-with-downwards-trend:
    • Net -0.26% decrease in POIs with polygon_class = OWNED_POLYGON
    • Cleaner, less cluttered visuals ๐Ÿ—บ ๐Ÿ’ฏ
  • It's important to identify when POIs overlap one another in the real world, and we highlight these relationships by setting the parent_safegraph_place_id of the smaller POI equal to the safegraph_place_id of the larger, encompassing POI. We noticed that some of these relationships had gone undocumented, so we expanded the types of POIs that qualify as "parents" when hosting a smaller POI within its bounds (see the "Spatial Hierarchy" section of the Places Manual for details). This change produced +215,665 POIs that are children and +98,218 POIs that are parents :family:. The largest increase in net new children by parent category are the following:

    • Gasoline Stations with Convenience Stores (447110) +71,757 child POIs โ›ฝ๏ธ
    • Elementary and Secondary Schools (611110) +48,791 child POIs ๐Ÿซ
    • Malls (531120) +39,452 children POIs ๐Ÿ›
    • Nature Parks and Other Similar Institutions (713910) +30,238 child POIs :deciduous-tree:
    • Golf Courses and Country Clubs (712910) +16,543 child POIs โ›ณ๏ธ
  • We tweaked the definition of polygon_class so that parent POIs who happen to share the same polygon as their children can maintain an OWNED_POLYGON classification. A canonical example is a hotel (parent) containing a restaurant (child). In many cases, we are not confident about the restaurant's true shape, so a unique polygon is not provided for the restaurant; and when this occurs, the restaurant shares the same polygon as the hotel. This does not mean the polygon is a bad representation of the hotel's shape just because its child also belongs to it, so the hotel is given an OWNED_POLYGON classification while the restaurant is given a SHARED_POLYGON classification. Otherwise, if several POIs map to the same polygon (excluding children), the POIs are classified as having a SHARED_POLYGON. For more details on how we're thinking about polygon classification, reference the Places Manual. We are especially interested in tracking the polygon_class for all places that are both (i) branded and (ii) do NOT have a parent_safegraph_place_id; we call this group "branded, no-parent". We want 100% of "branded, no-parent" POIs to have polygon_class = OWNED_POLYGON.

    • Last month, the percent OWNED polygons for branded, no-parent POIs was 88.6%
    • This month it is 82.8% :chart-with-downwards-trend:
    • The net -5.8% decrease is a result of our efforts to remove overlapping polygons (as described above), and this number is a better reflection of the total POIs with truly "OWNED" polygons. Here is how we are tracking on this metric across releases: OWNED vs SHARED Polygons in SafeGraph Places Release History.

Bug Fixes and Known Issues - Geometry

  • Centroid-Radius Polygons -- As discussed in March 2019 release notes. We internally track centroid-radius polygons vs precise polygons and strive for 100% precise polygons. You can measure this yourself using the is_synthetic column.
    • This release, we saw a slight decrease to 94.5% precise polygons (94.9% last month) :chart-with-downwards-trend:
    • Here is how we are tracking on this metric across releases: Centroid-Radius Polygon Tracking.

Enhancements - Core Places and Brands

  • Last month SG Places had 6,047,377 points-of-interest. This month SG Places has 5,929,331 points-of-interest (net -118,046 places). These are -118,187 US Places and +141 CA places. We removed ~119k POIs of individual physicians to help highlight the actual medical center POI. This is the main driver behind the net decrease in places this month.
  • We've added +46 new brands with a focus on health care in the U.S. and grocery stores in Canada - including the parent brand Loblaws ๐Ÿ‘จโ€โš•๏ธ ๐Ÿ‘ฉโ€โš•๏ธ ๐Ÿ›’
    New Brands Include...
    • Ascension Health (healthcare.ascension.org, SG_BRAND_6c6c46d79d7e982d) with 942 US and 0 CA places.
    • Encompass Health (encompasshealth.com, SG_BRAND_6afec58ddc0716233e8a8e528bbe9b42) with 408 US and 0 CA places.
    • SSM Health (ssmhealth.com, SG_BRAND_bb91c79b7e30900c4c0e1a9bfe33956f) with 288 US and 0 CA places.
    • BayCare (baycare.org, SG_BRAND_54cc3d9ff8318d4b) with 274 US and 0 CA places.
    • Nofrills (nofrills.ca, SG_BRAND_24d619241fb82b71), parent brand: (Loblaws, SG_BRAND_1b47663a692f81de) with 0 US and 256 CA places.
    • Real Canadian Superstore (realcanadiansuperstore.ca, SG_BRAND_70a9567c853bfe0e), parent brand: (Loblaws, SG_BRAND_1b47663a692f81de) with 0 US and 115 CA places.
    • Independent Grocers (yourindependentgrocer.ca, SG_BRAND_569ac68b31a09938), parent brand: (Loblaws, SG_BRAND_1b47663a692f81de) with 0 US and 113 CA places.
    • Valu-Mart (valumart.ca, SG_BRAND_6175f233a2829a64), parent brand: (Loblaws, SG_BRAND_1b47663a692f81de) with 0 US and 51 CA places.
  • and 38 more!

Bug Fixes and Known Issues - Core Places and Brands

  • We discovered some store count fluctuations affecting a handful of recently added brands. These corrections resulted in significant changes in the total number of POI for each affected brand, but the new count is correct. For transparency, we'd like to list some of these corrections as examples in no particular order:

    • Rolex (SG_BRAND_495b086776e9efccc306e291f4948925). Net POI count change: US: -310 CA: 0. Bug: Previously included Rolex dealers; now limited to Rolex's own boutiques.
    • Bentley Motors (SG_BRAND_1e8cb2c9bf1caabd). Net POI count change: US: 20 CA: 0. Bug: Unintentionally churned 20 POIs in the April release; these are added back in the May release.
    • Don Roberto Jewelers (SG_BRAND_77f04f8fdc110101). Net POI count change: US: 71 CA: 0. Bug: Unintentionally churned 71 POI in the April release; these are added back in the May release.
    • Roly Poly (SG_BRAND_9fb0158640de6cb692a0946314f5a605). Net POI count change: US: 13 CA: 0. Bug: Unintentionally churned 13 POI in the April release; these are added back in the May release.
  • Bad SGPID Churn -- Bad sgpid churn are undesired failures to maintain a consistent safegraph_place_id (sgpid) between releases (see discussion in March 2019 release). We internally track and estimate our performance in this domain and share these numbers in our release notes for maximum transparency. In this release:

    • We dropped 147,373 sgpids (8,758 branded and 138,615 non-branded).
    • We added 29,327 sgpids (9,250 branded and 20,077 non-branded).
    • Note: A large proportion of these are true openings and closings, and the dropped sgpids of true closings are reflected in Core Places files that include closed POIs. We intentionally dropped more than 119k physician POIs this month (81% of the total dropped sgpids), so these are not contributing to what we consider "bad" churn. We are continuing to work on better metrics to distinguish good vs. bad churn.
  • Category Fill Rate We monitor category fill rate with 3 metrics: (1) category fill rate across the entire dataset, (2) category fill rate for branded POI, (3) category fill rate in the brand_info file (brand-level categories). We want all of these numbers to be 100%.

    • (1) All POI category fill rate. Last month 98.8%. This month 98.8%.
    • (2) Branded POI category fill rate. Last month 100%. This month 100% :100:
    • (3) Brand-level category fill rate (brand_info file). Last month 100%. This month 100%. :100:

**Calculating Diffs**
Curious to find the specific records that were either **added, deleted, or saw an attribute change** from one release to the next? Visit "Calculating Diffs" in our [Data Science Resources](https://docs.safegraph.com/docs/data-science-resources#section-calculating-diffs) to get started. 


**Also check out these new ways to get SafeGraph data: **
  * Need some extra data on other SafeGraph products? Check out the [SafeGraph Data Bar.](https://shop.safegraph.com/) 
  * Heavy AWS User?  Check out our [listings in the AWS Data Exchange](https://aws.amazon.com/marketplace/search/results?filters=vendor_id&vendor_id=7d5ff8ca-105f-4856-9d99-5f2f1d83223c).
  * Are you an Esri or ArcGIS user? Check out our FREE data [SafeGraph Places in the Esri Marketplace](https://marketplace.arcgis.com/listing.html?id=3425348e4bee4059af2b353e52df43c2) and enjoy [SafeGraph Places in Esri Basemaps](https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/mapping/new-places-in-esri-vector-basemaps/). 
  * Or just drop us a line! Your data needs are our data delights!