July-2021 Release Notes
Can't beat the heat? π₯΅ We're serving up an ocean of cool new data! π π Welcome to the July 2021 release notes (2021-06-10/1623334361 shipped 2021-07-07).
Highlights
- +1 Patterns column π
- +2 Brand Info columns βΌοΈ
- -3 Core Places columns π
- "POINT" POIs make their Places debut π
Table of Contents:
Enhancements - Core Places and Brands
- After 3+ years of uniquely identifying SafeGraph Places, we have officially retired
safegraph_place_id
andparent_safegraph_place_id
into SafeGraph lore. We learned a lot from buildingsafegraph_place_id
and are excited to rely exclusively onplacekey
andparent_placekey
as our unique and persistent ID moving forward π . Learn more about how Placekey is unlocking access to spatial data here. - We also retired
tracking_opened_since
as it did not provide additive value in our efforts to communicate open/close dates for places. If a POI has anopened_on
value, it implies we've been tracking it since that date, and if a POI does not have anopened_on
value, it implies we were not able to track the exact date it opened. See more about how we track new store openings and permanent store closures here. - As we work to expand into new countries, it's useful to show which countries are covered for each brand. The Brand Info file now includes two additional columns detailing which countries we have at least one open POI (
iso_country_codes_open
) and which countries we have at least one closed POI (iso_country_codes_closed
) for a particular brand π π. Please note that these new json columns have quotes escaped differently than other SafeGraph json columns, and you may notice some undesirable "" in the data depending on your tech stack . This will be corrected in the August release to align with how we have always escaped quotes for json columns. - Last month, SG Places had 8,413,852 points-of-interest (including closed POIs). This month, SG Places has 8,638,522 points-of-interest (net + 224,670 places). These are +174,961
US
Places πΊπΈ , +14,027CA
places π¨π¦, and +35,682GB
places π¬π§ . - We've added 94 brands (+68 with πΊπΈ coverage, +44 with π¨π¦ coverage, +15 with π¬π§ coverage) including:
- Allpoint ATM (
SG_BRAND_11f4c85f01baedd5040bb96211cebbf1
) with 38,517US
Places, 18,395GB
Places, and 1,436CA
Places - Ziggi's Coffee (
SG_BRAND_d7dca165be7d62b2
) with 33US
Places - Americold Logistics (
SG_BRAND_8744f79535699202
) with 194US
Places, 8CA
Places, and 2GB
Places - +24 Commercial Banking (522110) -- 13 of which are
geometry_type
= "POINT" ATM brands - +7 Limited-Service Restaurants brands (722511) π΄
- View the full list here
- Allpoint ATM (
Brand Openings and Closings
-
We rely on POI metadata to track store openings and closings, and we are especially interested in understanding open/close dates for branded POIs. It can take more than a month to infer open/close dates, so we report brand open/close metrics on a one month delay.
-
In this release, we flagged 175 brands with at least one store closure in May 2021, and 197 brands with at least one store opening in May 2021.
-
Enhancements - Categories
- In the spirit of breaking traditions, we now provide unique types of places that are not defined by polygons in SafeGraph Geometry. We've added 182k "point-only" POIs to our Core Places offering. These include:
- 146k ATMs:
naics_code
= 522110 (Commercial Banking) - 36k electric vehicle charging stations:
naics_code
= 447190 (Other Gasoline Stations) π β‘οΈ - These premium rows are available upon request and are distinguished by having a "POINT" value in the new
geometry_type
column (positioned at the end of the Core Places schema). All traditional SafeGraph Places have "POLYGON" in thegeometry_type
column. - Stay tuned for additional "POINT" POIs in the works (kiosks and transit stops!) and reach out to your Customer Success Manager or contact sales to learn more π.
Drops β¬οΈ
We ingest data from many sources, and due to source changes and processing changes, Placekeys churn over time. In this release, we dropped 150,762 Placekeys (31,535 branded and 119,227 non-branded). To keep track of the status, predecessors, and latest successor of each Placekey, hit the Lineage API for free!
Major reasons for drops:
- ~3k dropped as result of improved deduplication π―
- ~121k dropped due to changes to the Where part as a result of polygon cleaning and source changes
- Read more about the structure of Placekeys here
Enhancements - Geometry
- In June, we cleaned out more than 150k redundant, overlapping polygons to improve visualization and simplify visit attribution. π§Ή π―
- We also sourced new polygons for ultra prominent POIs (Disney Resorts, major casinos in Las Vegas, International airports, etc.) and built logic to ensure that these POIs NEVER lose their polygons or
placekeys
π’ π° . As an example, Walt Disney World Resort (zzw-222@8fy-fjg-b8v
) now has a polygon more than 20X its original size and contains 234 child POIs (previously, 1 child POI). - While OWNED polygons are preferred, it does not mean that SHARED polygons are inherently bad. It only means that the exact shape of each POI within the polygon is not discernible, but the general location can be identified by the centroid (
latitude
&longitude
). π― - When
enclosed
= FALSE, it indicates that there are reasonable means to derive a unique polygon for the POI (even whenparent_placekey
is not null), and we strive for 100% of branded, non-enclosed POIs to have polygon_class = "OWNED_POLYGON." - Last month, the percent OWNED polygons for branded, non-enclosed POIs was 74.5%
- This month, the percent OWNED polygons for branded, non-enclosed POIs is 74.4%
- Here is how we're tracking on this metric across releases: OWNED vs SHARED Polygons in SafeGraph Places Release History.
- See the September-2020 release notes for details about the
enclosed
column and tweaks to this metric.
Bug Fixes and Known Issues - Geometry
-
We are aware of a single POI that did not behave as expected when sourcing a new polygon and establishing its permanence π . Disney's Animal Kingdom (
zzy-222@8fy-fjg-c3q
) took on a shape far too large for its grounds, and this will be corrected in the August 2021 release. -
The
building_height
column will be null until further notice. This column only had a ~25% fill rate, and due to some of the challenges in fill rate we are re-evaluating the best way to source and improve this data going forward π π. -
Centroid-Radius Polygons -- As discussed in March 2019 release notes. We internally track centroid-radius polygons vs precise polygons and strive for 100% precise polygons. You can measure this yourself using the
is_synthetic
column. -
Last release, the percent of precise polygons was 96.3%
-
This release, the percent precise polygons is also 96.3%
- Here is how we are tracking this metric across releases: Centroid-Radius Polygon Tracking.
- See here for a short list of POI categories which we do not require precise polygons
Enhancements - Patterns
- The July 2021 backfill is here! There is a special "July 2021 Backfill Release Notes" doc just for more in-depth news and guidance around the backfill.
- This backfill restates foot traffic activity from January 1st 2018 - present for Weekly Patterns, Monthly Patterns, and Neighborhood Patterns. π₯
- By popular demand, we have simplified the way we calculate
related_same_day_brand
π¨βπ©βπ§βπ¦.
Going forward, the value shown will be a simple percentage representing: overlapping visitors to both the brand and the applicable POI divided by visitors to the POI. The mapping will be limited to the top 20 brands. Previously, we were showing a mapping from a brand name to an index. The index was supposed to represent how strongly related the POI and the other brands are. Brands that are generally popular were βpenalizedβ in calculating this index, but this was overly complicated. The format will remain the same. related_same_month_brand
(Monthly Patterns) andrelated_same_week_brand
(Weekly Patterns) will be modified with the same change as withrelated_same_day_brand
but applied to a monthly or weekly time period as applicable. The format will remain the same.- We are adding a new column called
visitor_home_aggregation
π. This column will work just likevisitor_home_cbgs
except it will show device origin based on a larger census geography than in thevisitor_home_cbgs
column. Having this higher level aggregation will enable users to see home origins that might otherwise be missed since there was only one device from a given census block group but many from the higher level aggregation. - For the U.S. πΊπΈ, the larger geography will be census tracts. Census tracts have a population of 1,200-8,000 versus population of 600-3,000 for a census block group.
- For Canada π¨π¦, the larger geography will be aggregate dissemination areas. Aggregate dissemination areas have a minimum population of 5,000 versus a minimum population of 400 for dissemination areas.
visitor_home_aggregation
has an 88.7% fill rate, similar tovisitor_home_cbgs
(88.4%).- The average row has 16.8 census tracts / aggregate dissemination areas represented by visitors, compared to 20.1 census block groups / dissemination areas.
- Home panel summary: We have added a new column,
number_devices_primary_daytime
, tohome_panel_summary.csv
. This will allow users to normalize thevisitor_daytime_cbgs
column. - Finally, we are making some changes to columns now that the Canada release π is fully incorporated into Weekly Patterns:
- Neighborhood Patterns columns will now include Canadian source cbgs (i.e., with a βCA:β prefix), similar to what happened for Weekly Patterns in the May Release. The rows in Neighborhood Patterns will still only be U.S. only.
- For Weekly Patterns supplemental files,
state
column is renamed toregion
inhome_panel_summary.csv
andvisit_panel_summary.csv
, consistent with elsewhere in SafeGraph data. - Similarly, in Weekly Patterns supplemental files,
iso_country_code
is being moved from the furthest right in the file to the right ofregion
. This will occur inhome_panel_summary.csv
,visit_panel_summary.csv
, andnormalization_stats.csv
. - See Column Ordering in our docs for the latest columns in the schema.
- In last month's delivery, SG Patterns had 4,511,670 points-of-interest (US only). This month, SG Patterns has 4,534,432 points-of-interest (US only) (net 22,762). π
- Last month, SG Patterns had 1,031,959,744 visits from 35,337,908 visitors (US only). This month, SG Patterns has 1,030,120,272 visits from 39,848,724 visitors (US only) (delta -1,839,444 visits, 4,510,817 visitors).
- In our Neighborhood Patterns product, where you can see more generalized foot traffic flows, we have:
- 2,112,160,839 raw stops ( -93,285,293 from last month)
- 454,541,272 raw devices ( 449,973 from last month)
Interested in global POI coverage? Reach out to your customer success manager to learn more about how we're thinking about growing coverage internationally. π
**In case you missed it,** check out [last month's release notes](https://docs.safegraph.com/changelog/june-2021-release-notes). π
**Calculating Diffs**
Curious to find the specific records that were either **added, deleted, or saw an attribute change** from one release to the next? Visit "Calculating Diffs" in our [Data Science Resources](https://docs.safegraph.com/docs/data-science-resources#section-calculating-diffs) to get started.
**Fill Rates**
See the [Summary Statistics](https://docs.safegraph.com/docs/places-summary-statistics) page for all Core and Geometry column fill rates as well as a breakdown of POI count by `naics_code`.
**Explore**
Browse SafeGraph Core & Geometry data at your own pace [in these webmaps.](https://storymaps.arcgis.com/stories/8e5e066486f94f0ea698e507d46987f7)
**Also check out these new ways to get SafeGraph data: **
* Need data on the fly? [Try our Places API](https://shop.safegraph.com/api)!
* Need some extra data or other SafeGraph products? Check out the [SafeGraph Data Bar.](https://shop.safegraph.com/)
* Heavy AWS User? Check out our [listings in the AWS Data Exchange](https://aws.amazon.com/marketplace/search/results?filters=vendor_id&vendor_id=7d5ff8ca-105f-4856-9d99-5f2f1d83223c).
* Snowflake user? Check out our page on the [Snowflake Data Exchange](https://www.snowflake.com/datasets/safegraph/) :snowflake:
* Or just drop us a line! Your data needs are our data delights!