November-2019 Release Notes (2019-10-31)

What do you call a squash that weighs 3.14 pounds? ..... 🎃🥧

Welcome to the November-2019 Places release notes (v2019-10-31/1572516138) shipped 2019-11-05.

Table of Contents:

Enhancements - Core Places and Brands

  • Last month SG Places had 6,059,626 points-of-interest. This month SG Places has 5,954,839 points-of-interest (net - 104,787 places). These are 5,261,939 US Places and 692,900 CA places.
    • Why did we lose 105k POI? We implemented some new methods for cleaning and removing noisy and duplicate POI from low-quality sources, and in particular removed a lot of noisy and duplicate elementary schools and secondary schools. Although POI count decreased, this is an improvement in overall quality.
  • Improved NAICS Inference -- We have received a lot of feedback from customers about category information, and we are making several improvements in coming months to take categories to the next level. In this release the fill-rates of the columns naics_code, top_category, and sub_category have increased from 89.1% to 96.1%. :chart-with-upwards-trend: Wow! How? We overhauled our entire approach for inferring NAICS codes to use deep neural networks which can study much larger training sets and extract much more information from many noisy NLP signals. The fill rate has increased while maintaining high-precision. Look for more improvements in our category information and systems in coming months. :thumbsup:
  • We've added net 126 new brands :confetti-ball: Including many Canada-only brands.
    New Brands Include...
    • Shoppers Drug Mart (shoppersdrugmart.ca, SG_BRAND_4a7ed5a58c062288) with 1098 places. Note: This brand has only Canada locations.
    • Home Hardware (homehardware.ca, SG_BRAND_33e20ce7b6aaceb8) with 995 places. Note: This brand has only Canada locations.
    • The Source (thesource.ca, SG_BRAND_5813c4f1b2c44860) with 465 places. Note: This brand has only Canada locations.
    • ATB Financial (atb.com, SG_BRAND_43d74cb900844e84) with 320 places. Note: This brand has only Canada locations.
    • UniPrix (uniprix.com, SG_BRAND_1a300c393a271586) with 318 places. Note: This brand has only Canada locations.
    • EB Games (ebgames.ca, SG_BRAND_1a52cd5cf6fe4102) with 306 places. Note: This brand has only Canada locations.
    • Timbermart (timbermart.ca, SG_BRAND_2add66c563fc063a) with 285 places. Note: This brand has only Canada locations.
    • Phenix Salon Suites (phenixsalonsuites.com, SG_BRAND_3d0743fea9b5a53) with 250 places. Note: This brand has only Canada locations.
    • Foodland Canada (foodland.ca, SG_BRAND_44af1b941d98b515) with 203 places. Note: This brand has only Canada locations.
    • Dulux Paints (dulux.ca, SG_BRAND_637bde207c3be956) with 194 places. Note: This brand has only Canada locations.
    • Tuxedo by Sarno (tuxedobysarno.com, SG_BRAND_374fe5914caae134) with 185 places.
    • Liquor Depot (liquordepot.ca, SG_BRAND_7d13e43b1863232c) with 136 places. Note: This brand has only Canada locations.
    • COBS Bread (cobsbread.com, SG_BRAND_59fedcb76a670e70) with 120 places.
    • Jugo Juice (jugojuice.com, SG_BRAND_e345121bc1bd019c) with 115 places. Note: This brand has only Canada locations.
    • Somerset Trust Company (somersettrust.com, SG_BRAND_3085582422fb690b) with 105 places.
    • Suzy (suzyshier.com, SG_BRAND_296fa7d68007d8dd) with 96 places. Note: This brand has only Canada locations.
    • IWC Schaffhausen (iwc.com, SG_BRAND_e786ddbda941052) with 93 places.
    • Lawtons Drugs (lawtons.ca, SG_BRAND_472a104c34f4312b) with 82 places. Note: This brand has only Canada locations.
    • Ralph's Italian ices & Ice Cream (ralphsices.com, SG_BRAND_5c2551f3e24165ee) with 76 places.
    • And 107 more!! :chart-with-upwards-trend:

Bug Fixes and Known Issues - Core Places and Brands

  • We found some errors involving over-labeling of POI for some brands. In other words, we were creating branded POI incorrectly at some locations. These fixes resulted in significant decreases in the total number of POI for those affected brands. The new count is correct, and for transparency we'd like to list some of these fixes as examples in no particular order.

  • Santander Bank, (SG_BRAND_7d798e198436df181b8821779fc990c7). Net POI count change: -1330. Bug: ATMs jumped into our data, so we have now removed them.

  • Gymboree, (SG_BRAND_a220bc15ea6d56b949bdc215da636a5a). Net POI count change: -480. Gymboree has closed all its locations. You will be missed.

  • Tempur-Pedic, (SG_BRAND_4c0bb3e7d813b2889a0544ef789a648a). Net POI count change: -439. Bug: Accidentally included distributors of Tempur-Pedic, now removed. Also, a bug in the parsing of street_address appended the latitude to end of field. This is corrected so, e.g., "929 Bellevue Way NE -122.2019939000" is now correctly "929 Bellevue Way NE".

  • Key Food, (SG_BRAND_b718bb44f07c2ad4). Net POI count change: -127. Bug: Problem with our data sourcing accidentally created duplicate records with names missing a space. Duplicates are now removed and count is correct.

  • MB Financial, (SG_BRAND_6a0e200ec0d79bb7). Net POI count change: -119. Bug: MB Financial is actually a part of Part of Fifth Third Bank (SG_BRAND_dfa5d1a8cb415413d7488ee070dca730) and we were incorrectly duplicating POI. MB Financial will no longer appear as a brand in SafeGraph Places.

  • Regis Salons, (SG_BRAND_7bdaf386ef829f4e). Net POI count change: -118. Bug: We improved our detection of closed stores for this brand, allowing us to confidently remove many closed locations.

  • Thomas Sabo, (SG_BRAND_d8c3dc16c8970aaba935b5c5477e1fc5). Net POI count change: -86. Bug: We discovered a bug in our data for Thomas Sabo which was incorrectly mapping cities and states creating impossile addresses. We have removed this brand from our product until we can correct the issue (hopefully next month).

  • Goodyear Tire & Rubber Co, (SG_BRAND_f43cf89a8c39504ff9a8649cab7f9a69). Net POI count change: -40. Bug: We spoke to Goodyear and confirmed that Goodyear only has storefronts for Auto Service (SG_BRAND_4b76770989b17a0268a5428efec0ae56) and Just Tires (SG_BRAND_6d841cdb3a9f8949a098dd696afed2d3). There are no brick & mortar locations branded "Goodyear Tire & Rubber", so brand_id SG_BRAND_f43cf89a8c39504ff9a8649cab7f9a69 will now be a parent brand in the brand_info.csv with no POI of its own. The POI previously had branded with SG_BRAND_f43cf89a8c39504ff9a8649cab7f9a69 were largely duplicates of its subsidiary locations.

  • We found some errors where we were missing some branded POI, and these fixes resulted in significant increases in the total number of POI for those affected brands. The new count is correct, and for transparency we'd like to list some of these fixes as examples in no particular order.

  • Cadillac, (SG_BRAND_f2e3f2b2db112d8291803d38d52123e1). Net POI count change: 466. Bug: We improved our data sourcing to get more complete coverage.

  • MinuteClinic, (SG_BRAND_7addbb50380d485c29258638e50c38ac). Net POI count change: 360. Bug: We improved our data sourcing to get more complete coverage.

  • Freeway Insurance Services, (SG_BRAND_4f35805991e67d975d44d2200d8136ad). Net POI count change: 437. Bug: We improved our data sourcing to get more complete coverage.

  • Get It Now!, (SG_BRAND_3dc0f5fa37de36083a9f98fd6ac330df). Net POI count change: 16. Bug: We improved our data sourcing to get more complete coverage.

  • Pizza King, (SG_BRAND_84c1c636e9945ee8fa14664918695ffd). Net POI count change: 60. Bug: We improved our data sourcing to get more complete coverage.

  • Buckeye CheckSmart, (SG_BRAND_1f5077c53c828dad4305c372482f40dd). Net POI count change: 112. Bug: We improved our data sourcing to get more complete coverage.

  • Gordmans, (SG_BRAND_e63be0fbaa85c8e95236434423e8794f). Net POI count change: 155. Bug: We improved our data sourcing to get more complete coverage.

  • Park Hyatt, (SG_BRAND_416288d2d347088eee8b3e454e6578ff). Net POI count change: -17, +6. Bug: Incorrectly reported Hyatt Centric POI, not Park Hyatt.

  • Andaz, (SG_BRAND_4f322046b2e2422a176b5ad9e554a048). Net POI count change: -17, +9. Bug: Incorrectly reported Hyatt Centric POI, not Andaz.

  • Tire Kingdom, (SG_BRAND_95af7b95a45755c490a3b06b29c75d90). Net POI count change: 130. Bug: We improved our data sourcing to get more complete coverage.

  • We discovered a technical issue in which some branded POI were being incorrectly labeled as non-branded, due to the way our system handles special sources for categories of non-branded POI. The primarily affected types of places were Branded Dispensaries, Branded Schools & Branded Daycare Centers. Around 1,500 POI (spread across 37 brands) are now correctly being branded. Some of the biggest changes are listed:

  • KinderCare (SG_BRAND_bb687a854c6bcb1). Net POI count change: 344.)

  • Primrose Schools (SG_BRAND_6ff66da1518bb61132995a7578bdb189). Net POI count change: 309.)

  • Childtime Learning Centers (SG_BRAND_7c17256da6defb067363a08e6f6dc43c). Net POI count change: 164.)

  • Tutor Time (SG_BRAND_32eb92a7712b2c101b8001ac5c2e3ea6). Net POI count change: 106.)

  • Rainbow Child Care Center (SG_BRAND_26c943d196a25be0). Net POI count change: 60.)

  • Discovery Point (SG_BRAND_1024f34d21a9f7e4). Net POI count change: 42.)

  • Brightside Academy (SG_BRAND_4fbceba7e8661373844d9401dbce327e). Net POI count change: 42.)

  • La Petite Academy (SG_BRAND_9fa908d38c44268e388fb1976738aed7). Net POI count change: 36.)

  • Childcare Network (SG_BRAND_90483ddacd2cc268f742373d1355b115). Net POI count change: 33.)

  • TangerOutlets (SG_BRAND_74b13c1bfa9312a3bfd0caa1da8b35ba). Net POI count change: 25.)

  • MedMen (SG_BRAND_83acfebea16b1911). Net POI count change: 15.)

  • Green Dragon (SG_BRAND_255a657997121455). Net POI count change: 9.)

  • Bad SGPID Churn -- Bad sgpid churn are undesired failures to maintain a consistent safegraph_place_id (sgpid) between releases (see discussion in March 2019 release). We internally track and estimate our performance in this domain and share these numbers in our release notes for maximum transparency. In this release:

    • We dropped 241,705 sgpids (17,252 branded and 224,453 non-branded).
    • We added 136,918 sgpids (20,550 branded and 116,368 non-branded).
    • Note: We intentionally dropped many POI as part of a general cleanup from some of our most noisy and duplicative sources. Most of these drops are duplicates being removed. Some percent of these are true openings and closings (or new brands); the remainder are bad sgpid churn. We are continuing to work on better metrics to distinguish these cases.
  • Category Fill Rate We monitor category fill rate with 3 metrics: (1) category fill rate across the entire dataset, (2) category fill rate for branded POI, (3) category fill rate in the brand_info file (brand-level categories). We want all of these numbers to be 100%.

    • (1) All POI category fill rate. Last month 89.1%. This month 96.1%. :chart-with-upwards-trend: We overhauled our category inference, see Improved NAICS Inference above.
    • (2) Branded POI category fill rate. Last month 100%. This month 100% :100:
    • (3) Brand-level category fill rate (brand_info file). Last month 100%. This month 100%. :100:

Enhancements - Geometry

758

Dollar General
(sg: 0989036d63624c5badd7b56b6c62f25a )
3827 N College Ave, Indianapolis, IN 46205, US

One of 9,000 new & improved polygons in the November-2019 release.

Bug Fixes and Known Issues - Geometry

  • Geocoding Bug --
    • We discovered that we had a regression in our geocoding methods that impacted about 10% of POI in the August, September and October releases of Places. This means our centroids for certain POI lost accuracy. Because our polygon methodology relies on geocoding, we reported incorrect polygons for some POI. We have implemented corrections for the November release. These corrections fix about 90% of the issues generated by the original problem that first appeared in the August release. There are still 1-2% of POI in our Geometry product with geocoding issues due to the regression that we continue to work on. Meanwhile, independent parallel efforts have achieved general improvements in our geocoding and centroid accuracy during this time. So, all together, latitude and longitude centroids for POI are more accurate in the November release than they were in July before the regression, despite some known remaining issues that we continue to work on.
    • For Patterns customers we are offering enterprise customers a backfill of data using the November release with the corrections. Please contact us if you have questions.
  • Overly-precise centroids. We discovered a small bug causing some of the latitude and longitude points to be reported with unrealistic precision. For example, latitude = 40.72387511125298. Our estimates of POI centroids are not this precise, nor does anyone expecting them to be. This is now fixed, and we are sorry for the error. At least we have an excuse to show this excellent xkcd comic.
1112

"IN EITHER CASE, PLEASE STOP."
xkcd.com/2170/
We've stopped, and we're sorry.

  • Centroid-Radius Polygons -- As discussed in March 2019 release notes. We internally track centroid-radius polygons vs precise polygons and strive for 100% precise polygons. You can measure this yourself using the is_synthetic column. This release, we've decreased to 93.8% precise polygons (from 94.6 last month). Here is how we are tracking on that metric over recent releases: Centroid-Radius Polygon Tracking.
  • Percent polygon_class = OWNED (as described in Oct 2019 release notes. We examine polygon_class for all safegraph_place_id that are both (i) branded and (ii) do NOT have a parent_safegraph_place_id; we call this group "branded, no-parent". We want 100% of "branded, no-parent" POI to have polygon_class = OWNED_POLYGON. Last month, the percent OWNED polygons for branded, no-parent was 71.1%. This month it is 73.1%. :+1: Progress! We continue to work on this. Here is how we are tracking on this metric in recent releases: OWNED vs SHARED Polygons in SafeGraph Places Release History.

Enhancements - Patterns

  • In last month's delivery SG Patterns had 3,583,579 points-of-interest (US only). This month SG Patterns has 3,646,476 points-of-interest (US only) (net + 62,897 places) . :chart-with-upwards-trend:
  • Last month SG Patterns had 1,098,364,782 visits from 48,874,591 visitors. This month SG Patterns has 1,022,961,940 visits from 38,321,823 visitors visitors (delta - 75,402,842 visits, - 10,552,768 visitors). :chart-with-downwards-trend:
    • Note: The total visits count decreased by about 7% due to changes in our movement data supply. These will be reflected in all patterns numbers, so this is a good opportunity to use the panel overview summary files as a reference for the total size of the panel from which Patterns is measured.

Bug Fixes and Known Issues - Patterns

  • Due to the Geocoding Bug, we are offering enterprise customers a backfill of Patterns data using the November release with the corrections. Please contact us if you have questions.

Also check out these new ways to get SafeGraph data: 
  * Need some extra data on other SafeGraph products? Check out the [SafeGraph Data Bar.](https://shop.safegraph.com/) 
  * Are you an Esri or ArcGIS user? Check out our FREE data [SafeGraph Places in the Esri Marketplace](https://marketplace.arcgis.com/listing.html?id=3425348e4bee4059af2b353e52df43c2).
  * Or just drop us a line! Your data needs are our data delights!

p.s. **[SafeGraph Places is now available in Canada](https://docs.safegraph.com/changelog/october-2019-release-notes#section-canada-places-version-1-0-available-for-core-places-and-geometry-in-october-release)**