July-2021 Release Notes

Can't beat the heat? πŸ₯΅ We're serving up an ocean of cool new data! :ocean: :sunglasses: Welcome to the July 2021 release notes (2021-06-10/1623334361 shipped 2021-07-07).

Highlights

  • +1 Patterns column :tada:
  • +2 Brand Info columns :bangbang:
  • -3 Core Places columns :wave:
  • "POINT" POIs make their Places debut :round-pushpin:

Table of Contents:

Enhancements - Core Places and Brands

  • After 3+ years of uniquely identifying SafeGraph Places, we have officially retired safegraph_place_id and parent_safegraph_place_id into SafeGraph lore. We learned a lot from building safegraph_place_id and are excited to rely exclusively on placekey and parent_placekey as our unique and persistent ID moving forward πŸ”‘ . Learn more about how Placekey is unlocking access to spatial data here.
  • We also retired tracking_opened_since as it did not provide additive value in our efforts to communicate open/close dates for places. If a POI has an opened_on value, it implies we've been tracking it since that date, and if a POI does not have an opened_on value, it implies we were not able to track the exact date it opened. See more about how we track new store openings and permanent store closures here.
  • As we work to expand into new countries, it's useful to show which countries are covered for each brand. The Brand Info file now includes two additional columns detailing which countries we have at least one open POI (iso_country_codes_open) and which countries we have at least one closed POI (iso_country_codes_closed) for a particular brand 🌎 🌍. Please note that these new json columns have quotes escaped differently than other SafeGraph json columns, and you may notice some undesirable "" in the data depending on your tech stack . This will be corrected in the August release to align with how we have always escaped quotes for json columns.
  • Last month, SG Places had 8,413,852 points-of-interest (including closed POIs). This month, SG Places has 8,638,522 points-of-interest (net + 224,670 places). These are +174,961 US Places πŸ‡ΊπŸ‡Έ , +14,027 CA places πŸ‡¨πŸ‡¦, and +35,682 GB places πŸ‡¬πŸ‡§ .
  • We've added 94 brands (+68 with πŸ‡ΊπŸ‡Έ coverage, +44 with πŸ‡¨πŸ‡¦ coverage, +15 with πŸ‡¬πŸ‡§ coverage) including:
    • Allpoint ATM (SG_BRAND_11f4c85f01baedd5040bb96211cebbf1 ) with 38,517 US Places, 18,395 GB Places, and 1,436 CA Places
    • Ziggi's Coffee (SG_BRAND_d7dca165be7d62b2) with 33 US Places
    • Americold Logistics (SG_BRAND_8744f79535699202) with 194 US Places, 8 CA Places, and 2 GB Places
    • +24 Commercial Banking (522110) -- 13 of which are geometry_type = "POINT" ATM brands
    • +7 Limited-Service Restaurants brands (722511) :fork-and-knife:
    • View the full list here

Brand Openings and Closings

  • We rely on POI metadata to track store openings and closings, and we are especially interested in understanding open/close dates for branded POIs. It can take more than a month to infer open/close dates, so we report brand open/close metrics on a one month delay.

Enhancements - Categories

  • In the spirit of breaking traditions, we now provide unique types of places that are not defined by polygons in SafeGraph Geometry. We've added 182k "point-only" POIs to our Core Places offering. These include:
  • 146k ATMs: naics_code = 522110 (Commercial Banking)
  • 36k electric vehicle charging stations: naics_code = 447190 (Other Gasoline Stations) πŸš— ⚑️
  • These premium rows are available upon request and are distinguished by having a "POINT" value in the new geometry_type column (positioned at the end of the Core Places schema). All traditional SafeGraph Places have "POLYGON" in the geometry_type column.
  • Stay tuned for additional "POINT" POIs in the works (kiosks and transit stops!) and reach out to your Customer Success Manager or contact sales to learn more :round-pushpin:.

Drops ⬇️

We ingest data from many sources, and due to source changes and processing changes, Placekeys churn over time. In this release, we dropped 150,762 Placekeys (31,535 branded and 119,227 non-branded). To keep track of the status, predecessors, and latest successor of each Placekey, hit the Lineage API for free!

Major reasons for drops:

Enhancements - Geometry

  • In June, we cleaned out more than 150k redundant, overlapping polygons to improve visualization and simplify visit attribution. 🧹 πŸ’―
  • We also sourced new polygons for ultra prominent POIs (Disney Resorts, major casinos in Las Vegas, International airports, etc.) and built logic to ensure that these POIs NEVER lose their polygons or placekeys :roller-coaster: 🎰 . As an example, Walt Disney World Resort (zzw-222@8fy-fjg-b8v) now has a polygon more than 20X its original size and contains 234 child POIs (previously, 1 child POI).
  • While OWNED polygons are preferred, it does not mean that SHARED polygons are inherently bad. It only means that the exact shape of each POI within the polygon is not discernible, but the general location can be identified by the centroid (latitude & longitude). 🎯
  • When enclosed = FALSE, it indicates that there are reasonable means to derive a unique polygon for the POI (even when parent_placekey is not null), and we strive for 100% of branded, non-enclosed POIs to have polygon_class = "OWNED_POLYGON."
  • Last month, the percent OWNED polygons for branded, non-enclosed POIs was 74.5%
  • This month, the percent OWNED polygons for branded, non-enclosed POIs is 74.4%

Bug Fixes and Known Issues - Geometry

  • We are aware of a single POI that did not behave as expected when sourcing a new polygon and establishing its permanence πŸ˜” . Disney's Animal Kingdom (zzy-222@8fy-fjg-c3q) took on a shape far too large for its grounds, and this will be corrected in the August 2021 release.

  • The building_height column will be null until further notice. This column only had a ~25% fill rate, and due to some of the challenges in fill rate we are re-evaluating the best way to source and improve this data going forward :house: :straight-ruler:.

  • Centroid-Radius Polygons -- As discussed in March 2019 release notes. We internally track centroid-radius polygons vs precise polygons and strive for 100% precise polygons. You can measure this yourself using the is_synthetic column.

  • Last release, the percent of precise polygons was 96.3%

  • This release, the percent precise polygons is also 96.3%

Enhancements - Patterns

  • The July 2021 backfill is here! There is a special "July 2021 Backfill Release Notes" doc just for more in-depth news and guidance around the backfill.
  • This backfill restates foot traffic activity from January 1st 2018 - present for Weekly Patterns, Monthly Patterns, and Neighborhood Patterns. πŸ’₯
  • By popular demand, we have simplified the way we calculate related_same_day_brand πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦.
    Going forward, the value shown will be a simple percentage representing: overlapping visitors to both the brand and the applicable POI divided by visitors to the POI. The mapping will be limited to the top 20 brands. Previously, we were showing a mapping from a brand name to an index. The index was supposed to represent how strongly related the POI and the other brands are. Brands that are generally popular were β€œpenalized” in calculating this index, but this was overly complicated. The format will remain the same.
  • related_same_month_brand (Monthly Patterns) and related_same_week_brand (Weekly Patterns) will be modified with the same change as with related_same_day_brand but applied to a monthly or weekly time period as applicable. The format will remain the same.
  • We are adding a new column called visitor_home_aggregation πŸŽ‰. This column will work just like visitor_home_cbgs except it will show device origin based on a larger census geography than in the visitor_home_cbgs column. Having this higher level aggregation will enable users to see home origins that might otherwise be missed since there was only one device from a given census block group but many from the higher level aggregation.
  • For the U.S. πŸ‡ΊπŸ‡Έ, the larger geography will be census tracts. Census tracts have a population of 1,200-8,000 versus population of 600-3,000 for a census block group.
  • For Canada πŸ‡¨πŸ‡¦, the larger geography will be aggregate dissemination areas. Aggregate dissemination areas have a minimum population of 5,000 versus a minimum population of 400 for dissemination areas.
  • visitor_home_aggregation has an 88.7% fill rate, similar to visitor_home_cbgs (88.4%).
  • The average row has 16.8 census tracts / aggregate dissemination areas represented by visitors, compared to 20.1 census block groups / dissemination areas.
  • Home panel summary: We have added a new column, number_devices_primary_daytime, to home_panel_summary.csv. This will allow users to normalize the visitor_daytime_cbgs column.
  • Finally, we are making some changes to columns now that the Canada release 🍁 is fully incorporated into Weekly Patterns:
  • Neighborhood Patterns columns will now include Canadian source cbgs (i.e., with a β€œCA:” prefix), similar to what happened for Weekly Patterns in the May Release. The rows in Neighborhood Patterns will still only be U.S. only.
  • For Weekly Patterns supplemental files, state column is renamed to region in home_panel_summary.csv and visit_panel_summary.csv, consistent with elsewhere in SafeGraph data.
  • Similarly, in Weekly Patterns supplemental files, iso_country_code is being moved from the furthest right in the file to the right of region. This will occur in home_panel_summary.csv, visit_panel_summary.csv, and normalization_stats.csv.
  • See Column Ordering in our docs for the latest columns in the schema.
  • In last month's delivery, SG Patterns had 4,511,670 points-of-interest (US only). This month, SG Patterns has 4,534,432 points-of-interest (US only) (net 22,762). πŸ“ˆ
  • Last month, SG Patterns had 1,031,959,744 visits from 35,337,908 visitors (US only). This month, SG Patterns has 1,030,120,272 visits from 39,848,724 visitors (US only) (delta -1,839,444 visits, 4,510,817 visitors).
  • In our Neighborhood Patterns product, where you can see more generalized foot traffic flows, we have:
    • 2,112,160,839 raw stops ( -93,285,293 from last month)
    • 454,541,272 raw devices ( 449,973 from last month)
Interested in global POI coverage? Reach out to your customer success manager to learn more about how we're thinking about growing coverage internationally. 🌎 

**In case you missed it,** check out [last month's release notes](https://docs.safegraph.com/changelog/june-2021-release-notes). πŸ“

**Calculating Diffs**
Curious to find the specific records that were either **added, deleted, or saw an attribute change** from one release to the next? Visit "Calculating Diffs" in our [Data Science Resources](https://docs.safegraph.com/docs/data-science-resources#section-calculating-diffs) to get started. 

**Fill Rates**
See the [Summary Statistics](https://docs.safegraph.com/docs/places-summary-statistics) page for all Core and Geometry column fill rates as well as a breakdown of POI count by `naics_code`.

**Explore**
Browse SafeGraph Core & Geometry data at your own pace [in these webmaps.](https://storymaps.arcgis.com/stories/8e5e066486f94f0ea698e507d46987f7)

**Also check out these new ways to get SafeGraph data: **
* Need data on the fly? [Try our Places API](https://shop.safegraph.com/api)!  
* Need some extra data or other SafeGraph products? Check out the [SafeGraph Data Bar.](https://shop.safegraph.com/) 
* Heavy AWS User?  Check out our [listings in the AWS Data Exchange](https://aws.amazon.com/marketplace/search/results?filters=vendor_id&vendor_id=7d5ff8ca-105f-4856-9d99-5f2f1d83223c).
* Snowflake user? Check out our page on the [Snowflake Data Exchange](https://www.snowflake.com/datasets/safegraph/) :snowflake: 
* Or just drop us a line! Your data needs are our data delights!