December-2020 Release Notes
If only we could deliver our data with virtual wrapping paper ๐. Happy Holidays from everyone at SafeGraph, and welcome to the December 2020 release notes (2020-11-13/1605265391 shipped 2020-12-04). ๐ฏ ๐โญ๏ธ โ๏ธ
Highlights
- +1 category added to Places ๐
- +21 brands ๐
- +3 Patterns columns ๐จ
- +1 Neighborhood Patterns column ๐
- +1 Core and Geometry column
- Partridge in pear tree not included ๐
Table of Contents:
Enhancements - Patterns
-
Welcome to backfill-palooza! What is a "backfill?" A SafeGraph Patterns backfill is a restatement of historical foot traffic activity using the latest version of Core and Geometry to inform the recalculation. Improving our product is a double-edged sword ๐ก . While enhancing our Core and Geometry products is a worthwhile cause, it can lead to foot traffic inconsistency over time if POIs experience significant metadata changes. Restating the entire history with a single version of Core and Geometry alleviates this pain for customers licensing historical data and provides a consistent view of foot traffic over time. We do this a few times each year and strategically plan schema breaking changes around backfills. See here for details about previous backfills.
-
The December 2020 "backfill" restates foot traffic activity from January 1st 2018 - present (Nov. 30th) for Weekly Patterns, Monthly Patterns, and Neighborhood Patterns. ๐ฅ
-
This backfill is โsmarterโ than previous backfills and accounts for point-in-time store openings and store closures. ๐ง
-
For example, if a POI opened in January 2019, we will not attribute visits to the POI from January 2018 - December 2018 and will only attribute visits from January 2019 onward. On the other hand, if a POI closed in January 2019, we will only attribute visits from January 2018 - December 2018 and will not attribute visits from January 2019 - present.
-
We are relying on the metadata provided by our
closed_on
,opened_on
,tracking_closed_since
, andtracking_opened_since
columns to make these determinations. If we do not have open/close information for a POI, we will treat the POI as โopenโ for the duration of the backfill. See here for more about how we determine POI openings/closings. -
Both Weekly and Monthly Patterns now include
parent_safegraph_place_id
andparent_placekey
columns. Understanding spatial hierarchy is especially important for visit aggregation use cases to ensure visits are not โdouble countedโ at both the child and parent POI. See Visits to Parent POIs for details. -
We've added +2 bins to
bucketed_dwell_times
for increased granularity! ๐ฌ -
Home Panel Summary ("home_panel_summary.csv") in Weekly Patterns and Monthly patterns now only includes those devices whose homes are eligible to be counted in the
visitor_home_cbgs
column. See here for methodology on calculating homes. In the past, the counts in the Home Panel Summary included any device which had at least one visit during the given time period and for which we had identified a primary nighttime geohash with any degree of confidence. Meanwhilevisitor_home_cbgs
only included devices for which we had identified a primary nighttime geohash with a high degree of confidence. With this change, we are aligning our requirements for thevisitor_home_cbgs
column with the Home Panel Summary. This means the counts in the Home Panel Summary will be lower than they had been in the past but the Home Panel Summary andvisitor_home_cbgs
methodologies are now consistent which had been the original intent to help with normalization. -
Weekly Patterns now includes a โvisit_panel_summary.csvโ file (like Monthly Patterns) to show the total number of visits and visitors to POIs in a given week by state.
-
The
visitor_work_cbgs
column is now removed from the schema. This column has been empty since August 2020 in favor ofvisitor_daytime_cbgs
. -
carrier_name
has been added to weekly patterns as a premium column option. ๐ฅ -
In last month's delivery, SG Patterns had 4,186,911 points-of-interest (US only). This month, SG Patterns has 4,393,086 points-of-interest (US only) (net + 206,175). ๐
-
Last month, SG Patterns had 899,790,036 visits from 34,787,736 visitors. This month, SG Patterns has 778,401,379 visits from 35,531,210 visitors (delta -121,388,657 visits, + 743,474 visitors).
Enhancements - Neighborhood Patterns
- Neighborhood Patterns, which assigns foot traffic counts to census block groups instead of places, includes a new column,
work_behavior_device_home_areas
, to support commuting use cases. This column shows the device count per home census block group for devices which dwelled in a target census block group during the workday and for more than 6 hours ๐ ๐ผ .
Enhancements - Core Places and Brands
-
parent_placekey
is a new column listing theplacekey
of the parent (containing) POI if the place is contained by a parent entity (e.g. mall, airport, stadium). This mirrors the existing relationship betweensafegraph_place_id
/parent_safegraph_place_id
. See the Core Places schema for details. -
Last month SG Places had 6,654,168 points-of-interest (including closed POIs). This month SG Places has 6,874,427 points-of-interest (net + 220,259 places). These are +217,353
US
Places and +2,906CA
places. -
We've added +21 brands including +7 Gasoline Stations with Convenience Stores โฝ๏ธ and +5 Blood and Organ Banks (621991) ๐ฉธ ๐งโโ๏ธ
New Brands Include...- Sobeys ((sobeys.com), SG_BRAND_43deb5f906f8064a) with 0 US and 255 CA places.
- Hy-Vee Gas Station ((hy-vee.com/stores/gas-finder), SG_BRAND_58cd8c154f664b5c), parent brand: (Hy-Vee, SG_BRAND_8f8c9465b9550499b0540b26e9470dec) with 167 US and 0 CA places.
- Christmas Tree Shops ((christmastreeshops.com), SG_BRAND_e2ed21c3015b3571f677c10f987a7ccb), parent brand: (Bed Bath & Beyond, SG_BRAND_6e7bcf9086fc3b43babdfdf51a97759f) with 81 US and 0 CA places.
- Interstate Blood Bank ((interstatebloodbank.com), SG_BRAND_bf7dfe9485d088ab), parent brand: (GRIFOLS, SG_BRAND_3bb370e5a4a93cf0) with 36 US and 0 CA places.
- Suzuki ((suzuki.com), SG_BRAND_101186a9a44bc0354ed997696a6aefba) with 5781 US and 0 CA places.
- and 16 more!
Bug Fixes and Known Issues - Core Places and Brands
-
We discovered a few brand count fluctuations as a result of updated sourcing and other metadata bugs. These corrections resulted in significant changes in the total number of POIs for each affected brand, but the new count is correct. For transparency, we'd like to list some of these corrections as examples in no particular order:
- Castle Dental (
SG_BRAND_473e6c6e77f4c292
). Net POI count change: US: -288 CA: 0. Bug: Previously included affiliates. - T-Mobile (
SG_BRAND_4b82356db1a8f4a2db26dd5b7e30abba
). Net POI count change: US: 2507 CA: 0. Bug: Stores are now under the T-Mobile brand but still contain "Sprint" in the location_name.
- Castle Dental (
Enhancements - Categories
- With guidance from our customers, we continue to expand the definition of a SafeGraph "place." We're happy to announce that the December 2020 release features 1,761 corporate office locations across 65 of the Fortune 100 companies ๐ข. These POIs can be found by searching for
naics_codes
= 551114 (Corporate, Subsidiary, and Regional Managing Offices).
Category Fill Rate -- We monitor category fill rate with 3 metrics: (1) category fill rate across the entire dataset, (2) category fill rate for branded POI, (3) category fill rate in the brand_info file (brand-level categories). We want all of these numbers to be 100%.
- (1) All POI category fill rate. Last month 99.2%. This month 99.2%.
- (2) Branded POI category fill rate. Last month 100%. This month 100% ๐ฏ
- (3) Brand-level category fill rate (brand_info file). Last month 100%. This month 100% ๐ฏ
Drops โฌ๏ธ
-
We constantly ingest data from new sources, and many
safegraph_place_ids
(sgpids) are intentionally dropped, but we are unable to track each and every dropped sgpid. For the first time, the following metrics account for closed POIs as well (see more about our open/close columns here)- We dropped 17,564 sgpids (6,056 branded and 11,508 non-branded).
- ~2k dropped due to POI source fluctuations
- ~1k dropped as a result of bug fixes for branded POIs ๐
- ~8k dropped as a result of deduplication ๐ฏโโ๏ธ
-
The remaining drops are undesired failures to maintain a consistent sgpid between releases - known as bad sgpid churn (see discussion in March 2019 release). We are continuing to work on better metrics to distinguish good vs. bad churn.
-
In October, we cofounded the Placekey initiative and added
placekey
as a unique and persistent identifier for all POIs in the SafeGraph dataset. See here for more on how Placekey is unlocking access to geospatial data across industries.
Enhancements - Geometry
-
parent_placekey
is a new column listing theplacekey
of the parent (containing) POI if the place is contained by a parent entity (e.g. mall, airport, stadium). This mirrors the existing relationship betweensafegraph_place_id
/parent_safegraph_place_id
. See the Geometry schema for details. -
This month in Geometry world, we upgraded some of our "parent" polygons for
naics_codes
including but not limited to theme parks, golf courses, ski resorts, and airports โ๏ธ ๐ฟ โณ๏ธ ๐ข . This resulted in more uniform polygons across these important categories and provides consistency for the Patterns "backfill." -
While OWNED polygons are preferred, it does not mean that SHARED polygons are inherently bad. It only means that the exact shape of each POI within the polygon is not discernible, but the general location can be identified by the centroid (
latitude
&longitude
). ๐ฏ -
When
enclosed
= FALSE, it indicates that there are reasonable means to derive a unique polygon for the POI (even whenparent_safegraph_place_id
is not null), and we strive for 100% of branded, non-enclosed POIs to have polygon_class = "OWNED_POLYGON." -
Last month, the percent OWNED polygons for branded, non-enclosed POIs was 79.5%
-
This month, the percent OWNED polygons for branded, non-enclosed POIs is 78.0% ๐
- Here is how we're tracking on this metric across releases: OWNED vs SHARED Polygons in SafeGraph Places Release History.
- See the September-2020 release notes for details about the
enclosed
column and tweaks to this metric.
Bug Fixes and Known Issues - Geometry
- Centroid-Radius Polygons -- As discussed in March 2019 release notes. We internally track centroid-radius polygons vs precise polygons and strive for 100% precise polygons. You can measure this yourself using the
is_synthetic
column.- This release, precise polygons remained stable at 95.8%.
- Here is how we are tracking on this metric across releases: Centroid-Radius Polygon Tracking.
- See here for a short list of POI categories which we do not require precise polygons
- This release, precise polygons remained stable at 95.8%.
**In case you missed it,** check out [last month's release notes](https://docs.safegraph.com/changelog/november-2020-release-notes). ๐
**Calculating Diffs**
Curious to find the specific records that were either **added, deleted, or saw an attribute change** from one release to the next? Visit "Calculating Diffs" in our [Data Science Resources](https://docs.safegraph.com/docs/data-science-resources#section-calculating-diffs) to get started.
**Fill Rates**
See the [Summary Statistics](https://docs.safegraph.com/docs/places-summary-statistics) page for all Core and Geometry column fill rates as well as a breakdown of POI count by `naics_code`.
**Explore**
Browse SafeGraph Core & Geometry data at your own pace [in these webmaps.](https://storymaps.arcgis.com/stories/8e5e066486f94f0ea698e507d46987f7)
**Also check out these new ways to get SafeGraph data: **
* Need some extra data or other SafeGraph products? Check out the [SafeGraph Data Bar.](https://shop.safegraph.com/)
* Heavy AWS User? Check out our [listings in the AWS Data Exchange](https://aws.amazon.com/marketplace/search/results?filters=vendor_id&vendor_id=7d5ff8ca-105f-4856-9d99-5f2f1d83223c).
* Are you an Esri or ArcGIS user? Check out our FREE data [SafeGraph Places in the Esri Marketplace](https://marketplace.arcgis.com/listing.html?id=3425348e4bee4059af2b353e52df43c2) and enjoy [SafeGraph Places in Esri Basemaps](https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/mapping/new-places-in-esri-vector-basemaps/).
* Snowflake user? Check out our page on the [Snowflake Data Exchange](https://www.snowflake.com/datasets/safegraph/) :snowflake:
* Or just drop us a line! Your data needs are our data delights!