Welcome πŸ‘‹

Whether you’re a global enterprise, startup, or academic, learn how SafeGraph can improve your data science models.

Docs    Places API

September-2020 Release Notes

about a year ago by [email protected]

No days were dogged last month :sunglasses+: :dog2+:. Welcome to the September-2020 release notes πŸ“ (2020-08-28/1598612559 shipped 2020-09-04).

Highlights

  • New Geometry columns :fire+:
  • Another month of triple digit brand additions πŸ’―
  • Industrial POIs make Places debut :factory+:

Table of Contents:

Core Places and Brands

Enhancements - Core Places and Brands

  • Last month SG Places had 5,896,574 points-of-interest. This month SG Places has 5,901,528 points-of-interest (net +4,954 places). These are +2,024 US Places and +2,930 CA places.

  • We've added a whopping +104 new brands including +17 Full-Service Restaurants 🍴 and +18 Canada only brands 🍁
    New Brands Include...

    • Health Street (health-street.net), SG_BRAND_c5c154df3c9599e7) with 7,359 US and 0 CA places.
    • Dollarama (dollarama.com), SG_BRAND_36576148eaf3b474) with 0 US and 1,271 CA places.
    • Pharmasave (pharmasave.com), SG_BRAND_d1752c030c1194e) with 0 US and 715 CA places.
    • The Beer Store (thebeerstore.ca), SG_BRAND_70fa24e4f6fa7b8e) with 0 US and 430 CA places.
    • GoodLife Fitness (goodlifefitness.com), SG_BRAND_50f2b3d4c0a548de) with 0 US and 227 CA places.
      • and 99 more!

Bug Fixes and Known Issues - Core Places and Brands

  • We discovered a few brand count fluctuations as a result of updated sourcing and other metadata bugs. These corrections resulted in significant changes in the total number of POI for each affected brand, but the new count is correct. For transparency, we'd like to list some of these corrections as examples in no particular order:

    • Chase (SG_BRAND_cd8e7918010a87cc619849e00265c9a6). Net POI count change: US: -530 CA: 0. Bug: Previously included ATMs.
    • Cardsmart (SG_BRAND_137e699c352a2a85). Net POI count change: US: -95 CA: 0. Bug: Previously included places w/Cardmsart section in store (not stand alone store).
    • Park National Bank (SG_BRAND_2b18e4fd0b6de0e2). Net POI count change: US: -77 CA: 0. Bug: Previously included ATMs.
    • GUESS (SG_BRAND_44fc6c9812b78f5e76f9b25892fe6ad9). Net POI count change: US: -153 CA: -48. Bug: Created additional child brands (e.g., Guess Factory, Marciano by Guess) which include POIs previously branded as GUESS. Also had duplicates with pre-existing child brand G by Guess.
    • Calvin Klein (SG_BRAND_ba56feb34cfc3912b55a4c2429a44319). Net POI count change: US: -82 CA: -4. Bug: Removed duplicates.

Enhancements - Categories

  • Our current scope of a "place" is anywhere consumers can spend time or money. This has not changed as a whole, but we've made an exception to start sourcing industrial POIs such as warehouses, distribution centers, and B2B equipment/machinery wholesalers. 🚧 πŸ—

  • As of this release, SafeGraph Places has ~8,500 industrial POIs across 20 brands including Caterpillar (SG_BRAND_b36725e851150ae2a6e85ce2e48d8193), Amazon Distribution (SG_BRAND_fc2573e1b20d6dd1), John Deere (SG_BRAND_e16bb3b801ef62207574cb693cf57797), and International Trucks (SG_BRAND_4bc53bddaed9ceb9).

  • 6,865 of these industrial POIs also have foot traffic data in Patterns :100+:

Category Fill Rate -- We monitor category fill rate with 3 metrics: (1) category fill rate across the entire dataset, (2) category fill rate for branded POI, (3) category fill rate in the brand_info file (brand-level categories). We want all of these numbers to be 100%.

  • (1) All POI category fill rate. Last month 98.8%. This month 99.2%. :chart-with-upwards-trend+:
  • (2) Branded POI category fill rate. Last month 100%. This month 100% :100+:
  • (3) Brand-level category fill rate (brand_info file). Last month 100%. This month 100% :100+:

Drops ⬇️

  • We constantly ingest data from new sources, and many safegraph_place_ids (sgpids) are intentionally dropped, but we are unable to track each and every dropped sgpid. In this release:

    • We dropped 77,033 sgpids (20,908 branded and 56,125 non-branded).
      • 22k due to stale sourcing and redundancy
      • 2,784 dropped as a result of bug fixes for branded POIs :bug+:
      • 10,037 dropped as a result of deduplication πŸ‘―β€β™‚οΈ
      • 29,449 dropped due to permanent closures :x+: (dropped but not lost -- you will still see these POIs if you get the closed_on column).
  • The remaining drops are undesired failures to maintain a consistent sgpid between releases - known as bad sgpid churn (see discussion in March 2019 release). We are continuing to work on better metrics to distinguish good vs. bad churn.

Geometry

Enhancements - Geometry

  • We are thrilled to introduce two new Geometry columns! 🚨 πŸŽ‰

1.) building_height is the height above ground in meters when a POI's polygon is a building. This column has a 25.1% fill rate and is not available in Canada. See the Places Manual to learn more.

2.) enclosed is a boolean (true/false) column. When enclosed = TRUE, it means the POI is completely enclosed indoors by its parent and is only accessible by entering the parent structure. This provides deeper context around spatial relationships, and it also sheds light on how we treat enclosed POIs when building Patterns. When a POI is determined to be enclosed, we intentionally exclude it from receiving visits in Patterns and instead roll the visits up to its parent_safegraph_place_id because the accuracy of mobile GPS data deteriorates within large, indoor structures. For the nitty gritty on how we derive the enclosed column, see the Places Manual.

  • These columns are located at the end of the Geometry Schema. See the Column Ordering section for details on where all columns are located in pre-joined, product combo deliveries.

OWNED Polygons

πŸ“£ While OWNED polygons are preferred, it does not mean that SHARED polygons are inherently bad. It only means that the exact shape of each POI within the polygon is not discernible, but the general location can be identified by looking at the centroid (latitude & longitude). πŸ“£

Historically, we have measured polygon_class = "OWNED_POLYGON" for all POIs that are both (i) branded and (ii) do NOT have a parent_safegraph_place_id; we call this group "branded, no-parent".

With the introduction of the enclosed column, we can refine this metric to measure "OWNED_POLYGON" for all POIs that are both (i) branded and (ii) have enclosed = FALSE. When enclosed = FALSE, it indicates that there are reasonable means to derive a unique polygon for the POI (even when parent_safegraph_place_id is not null), and we should strive for 100% of these POIs to have polygon_class = "OWNED_POLYGON." See the Places Manual for more about enclosed.

Bug Fixes and Known Issues - Geometry

  • Centroid-Radius Polygons -- As discussed in March 2019 release notes. We internally track centroid-radius polygons vs precise polygons and strive for 100% precise polygons. You can measure this yourself using the is_synthetic column.

Patterns

Enhancements - Patterns

  • In last month's delivery, SG Patterns had 4,091,868 points-of-interest (US only). This month, SG Patterns has 4,078,861 points-of-interest (US only) (net -13,007). :chart-with-downwards-trend+:

  • Last month, SG Patterns had 818,376,187 visits from 32,786,255 visitors. This month, SG Patterns has 868,811,661 visits from 35,455,162 visitors (delta + 50,435,474 visits, + 2,668,907 visitors). :chart-with-upwards-trend+:

~~~~

What do a fire hydrant, a Gucci store, an apartment complex, and a tree in a national park have in common? They all have a Placekey πŸ”‘ . Learn more about relating the unrelated here.

In case you missed it, check out last month's release notes. πŸ“

Calculating Diffs
Curious to find the specific records that were either added, deleted, or saw an attribute change from one release to the next? Visit "Calculating Diffs" in our Data Science Resources to get started.

Fill Rates
See the Summary Statistics page for all Core and Geometry column fill rates as well as a breakdown of POI count by naics_code.

Also check out these new ways to get SafeGraph data: