October-2019 Release Notes (v2019-10-08)

1 Minute Summary:

  • Many exciting (breaking) changes. Thank you for your feedback and cooperation! :heart-eyes:
  • SafeGraph Core and Geometry now available in Canada 🇨🇦with 700,000+ POI.
  • Added over 528,000 POI in the USA across all categories. Also added over 150 new brands. :exclamation: :chart-with-upwards-trend: :exclamation:
  • brand_info now includes two new columns: stock_symbol (e.g., TSLA ) and stock_exchange (e.g., NASDAQ)

Table of Contents:

Breaking Changes in the October Release

  • There are breaking changes for Core Places, Geometry, Patterns and the brand_info.csv.
  • Delimiter. All (Core Places, Geometry, and Patterns and brand_info.csv) SafeGraph deliverables will be comma-delimited (,). Previously they were pipe-delimited (|). This aligns all of our delivery systems and provides data in the most common, expected format.
    • Note: when , is inside the string of a data value, then that string is fully-quoted with " in the file, so that it can be distinguished from the column delimiters. An example row for the columns safegraph_place_id, location_name, brands, city would appear in the CSV as: sg:13166a92a17d43aba153fec5b5f77835,Lexus of Chandler,"Lexus,Audi,Jaguar,Mercedes Benz", chandler
    • Your data technology platform (e.g. Excel, python pandas, spark, etc.) will usually handle this by default (but may require explicit delimiter configuration).
  • Delivery Cadence and Directory Structure. Previously, Core Places and Geometry for a particular Month release was delivered ~ 30th of the prior month, and Patterns was delivered ~ 7th of the month. Going forward from today all SafeGraph products will be delivered (together) on the ~ 7th of the month.
    • Delivery Location and Directory Structure. Up to 4 types of files will be delivered with the following structure: s3://customer-bucket/customer-prefix/{{sg-file-name}}/yyyy/mm/dd/hh/*.csv.gz. {{sg-file-name}} is one of the following:
      • core_poi, geometry, patterns or some combination like core_poi-geometry or core_poi-patterns or core_poi-geometry-patterns (depending on your subscription) will include all of the following for which you are subscribed: Core, Geo, Patterns (see full schema below).
      • brand_info (if subscribed to Core)
      • home_panel_summary (if subscribed to Patterns)
      • visits_panel_summary (if subscribed to Patterns)
    • Explanation: Previously, (Core + Geo) vs (Patterns) were delivered as separate files, which required you to manually join these data yourself on the safegraph_place_id and each file contained some redundant columns (e.g., location_name). Going forward from today, you will receive all SafeGraph products (for which you are subscribed) in a single schema, joined together on safegraph_place_id in files under the directory of the product(s) to which you are subscribed (e.g., /core_poi-geometry/. Redundant columns are removed in the join before delivery (see explicit schema below). This is a Left Join on the Core Places dataset (i.e., Core Places LEFT JOIN Patterns ON safegraph_place_id). All safegraph_place_ids without Patterns data will have NULL (empty) values for Patterns columns. No rows are dropped during the join because the set of safegraph_place_ids in Patterns data is always a strict subset of the safegraph_place_ids in Core data.
    • Note: Products that are not keyed on safegraph_place_id (i.e. brand_info.csv and panel overview data) are delivered at the same time, but are still stand alone files (see Delivery Location and Directory Structure above)
  • Schema Changes. To accommodate the release of Canada Places Version 1.0 we implement the following schema changes:
    • Core Places, and Patterns: Adding a new column iso_country_code.
    • Core Places, and Patterns: Renaming the column zip_code :arrow-forward: postal_code.
    • Core Places, and Patterns: Renaming column state :arrow-forward: region.
Column NameDescriptionTypeExample
iso_country_codeThe 2 letter ISO 3166-1 alpha-2 country code.StringCA
postal_codeWhen iso_country_code == US, then this is the USA 5 digit zip code. When iso_country_code == CA, then this is the Canadian postal code in the form of a 3 digit Forward Sortation Area (FSA), a space, and the 3 digit Local Delivery Unit (LDU).StringV6G 1B6
regionWhen iso_country_code == US, then this is the USA state or territory. When iso_country_code == CA, then this is the Canadian Province or territory.StringBC
  • Note: In the October release, Canada version 1.0 data is available only for Core Places and Geometry (not Patterns). But for consistency, these schema changes will apply to both Core and Patterns for US and CA.

  • To accommodate the new delivery format, we have redefined the column order for all possible delivery configurations. If column order matters to you, take heed. Full schema for the core/geometry/patterns files are as follows:

    • The order of columns for Core + Geometry + Patterns is safegraph_place_id,parent_safegraph_place_id,safegraph_brand_ids,location_name,brands,top_category,sub_category,naics_code,latitude,longitude,street_address,city,region,postal_code,open_hours,polygon_wkt,polygon_class,phone_number,is_synthetic,includes_parking_lot,iso_country_code,date_range_start,date_range_end,raw_visit_counts,raw_visitor_counts,visits_by_day,visitor_home_cbgs,visitor_work_cbgs,visitor_country_of_origin,distance_from_home,median_dwell,bucketed_dwell_times,related_same_day_brand,related_same_month_brand,popularity_by_hour,popularity_by_day,device_type
    • The order of columns for combined Core + Geometry is safegraph_place_id,parent_safegraph_place_id,safegraph_brand_ids,location_name,brands,top_category,sub_category,naics_code,latitude,longitude,street_address,city,region,postal_code,open_hours,polygon_wkt,polygon_class,phone_number,is_synthetic,includes_parking_lot,iso_country_code
    • The order of columns for Patterns (only) is safegraph_place_id,location_name,street_address,city,region,postal_code,brands,date_range_start,date_range_end,raw_visit_counts,raw_visitor_counts,visits_by_day,visitor_home_cbgs,visitor_work_cbgs,visitor_country_of_origin,distance_from_home,median_dwell,bucketed_dwell_times,related_same_day_brand,related_same_month_brand,popularity_by_hour,popularity_by_day,device_type,iso_country_code
    • The order of columns for Core (only) is safegraph_place_id,parent_safegraph_place_id,safegraph_brand_ids,location_name,brands,top_category,sub_category,naics_code,latitude,longitude,street_address,city,region,postal_code.
  • The brand_info.csv file that accompanies Core Places has two new columns stock_symbol and stock_exchange, both strings.

    • For example, for the brand Tesla (SG_BRAND_bc250e0d83c37b0953ada14e7bbc1dfd):
      • stock_symbol = TSLA
      • stock_exchange = NASDAQ
    • The full schema for the brand_info.csv is safegraph_brand_id,brand_name,parent_safegraph_brand_id,naics_code,top_category,sub_category,stock_symbol,stock_exchange
  • Patterns Visit Attribution Methodology

    • We have improved an aspect of our visits attribution methodology that affects all Patterns columns. Previously, we excluded any POI from the Patterns product if it was near a POI with a relatively much higher social media ranking. Based on your input and recent internal examinations, we've decided this exclusion detail in our methodology creates too much instability in the Patterns product month-to-month and has not achieved the desired intent. So, we are removing this exclusion. This means that when comparing October Release Patterns to prior versions of Patterns, there will be changes in visit counts (and the appearance of new POI in the Patterns product) that do not reflect changes in the real world but rather are due to the change in methodology. In light of this, we will be providing a one-time backfill to our current enterprise customers to enable you a more stable view of Patterns data across time.
      • Backfills of Patterns:
        • s3://customer-bucket/customer-prefix/{{products_ordered}}_backfill/del_yyyy/del_mm/del_dd/del_hh/data_yyyy/data_mm/*
        • s3://customer-bucket/customer-prefix/home_panel_summary_backfill/del_yyyy/del_mm/del_dd/del_hh/data_yyyy/data_mm/*
        • s3://customer-bucket/customer-prefix/vists_panel_summary_backfill/del_yyyy/del_mm/del_dd/del_hh/data_yyyy/data_mm/*
        • Where del_yyyy/del_mm/del_dd/del_hh is the datetime the backfill was delivered (hour is start hour) and data_yyyy/data_mm represents year and month being summarized for this set of patterns data. Note that the latter will also align with the columns date_range_start and date_range_end contained in each file of the Patterns data itself.
  • Changes to Address, City and State Fields. For both Core and Patterns, the casing has changed in the following columns street_address, city, and region (formerly state).

    • street_address and city is now using proper Title Casing. Formerly 235 main campus drive or salt lake city will now appear as 235 Main Campus Drive and Salt Lake City.
    • region (formerly state) will now appear all caps. Formerly tx will now appear TX.
    • This is a breaking change because any reliance on exact case-sensitive string matching is broken by this change.

Canada Places Version 1.0 available for Core Places and Geometry in October Release

  • SafeGraph Places Canada v1.0 includes Core and Geometry data for over 700,000 POI (branded and non-branded) covering over 450 brands! Here are detailed notes on what points-of-interest are included in SafeGraph Places Canada.
  • By default, all existing SafeGraph Places customers will continue to receive only iso_country_code == 'US'. If you want to add Canada data, please get in touch. :globe-with-meridians: :chart-with-upwards-trend:
  • If you choose to add CA data to your existing US data, then CA and US records will be intermixed in your delivery files. CA data is not partitioned or delivered in separate files from US data. If you want to isolate only the US data, then simply filter iso_country_code == 'US'.
  • Likewise, Country is a new Location filter available in SafeGraph Data Bar, and you can now download Canadian Core Places and Geometry from the SafeGraph Data Bar.
  • The brand_info.csv that accompanies Core Places is now international. This means that regardless of whether you receive US only, CA only, both, or some particular subset, you will see brands in brand_info.csv with only US POI, only CA POI, and brands with POI in both countries.

Enhancements - Core Places and Brands

  • Last month SG Places had 4,824,157 points-of-interest (US only). This month SG Places has 6,059,381 points-of-interest (5,352,381 US, 707,000 CA) (net + 1,235,224 places) . :exclamation: :chart-with-upwards-trend: :chart-with-upwards-trend:

  • The ~528k increase in US POI come from a multi-prong strategy to discover and source new datasets of Places over the last few months.

  • Top categories (naics_code) with more places:

    • Full-Service Restaurants (722511), net poi: +104810.
    • Commercial Banking (522110), net poi: +81068.
    • Beauty Salons (812112), net poi: +45617.
    • Hardware Stores (444130), net poi: +27747.
    • Fitness and Recreational Sports Centers (713940), net poi: +25076.
    • Snack and Nonalcoholic Beverage Bars (722515), net poi: +21835.
    • Child Day Care Services (624410), net poi: + 18351.
    • Religious Organizations (813110), net poi: + 17926.
    • Elementary and Secondary Schools (611110), net poi: +17755.
    • Electronics Stores (443142), net poi: +16477.
    • Supermarkets and Other Grocery (except Convenience) Stores (445110), net poi: +15484.
    • Used Car Dealers (441120), net poi: +15012.
    • Women's Clothing Stores (448120), net poi: +14645.
    • General Automotive Repair (811111), net poi: +14618.
    • Investment Advice (523930), net poi: +14438.
    • Furniture Stores (442110), net poi: +12617.
    • Tax Preparation Services (541213), net poi: +12414.
    • Hotels (except Casino Hotels) and Motels (721110), net poi: +12280.
    • Carpet and Upholstery Cleaning Services (561740), net poi: +11988.
    • Jewelry Stores (448310), net poi: +11308.
  • We have significantly improved address validation to enhance readability and accuracy of street_address. If you care about street addresses as much as we do, we have more specific address columns to split out address components. These are optional and available upon request for future deliveries!

    • primary_number
    • street_predirection
    • street_name
    • street_postdirection
    • street_suffix
  • We've added over 150 new brands :confetti-ball:
    New Brands Include...

    • IGA (iga.com, SG_BRAND_5294233e5c7e4164) with 932 places.
    • Timewise Food Store (landmarkindustries.com, SG_BRAND_b6990bee8584deea7ad1927df389c568) with 218 places.
    • Enmarket (enmarket.com, SG_BRAND_75f8072f84074197) with 123 places.
    • Bahama Buck's (bahamabucks.com, SG_BRAND_6f77afb4a1d3c85a) with 111 places.
    • Blo Blow Dry Bar (blomedry.com, SG_BRAND_6e2a5480cffedf5) with 94 places.
    • Banc of California (bancofcal.com, SG_BRAND_17aa80ed1abae2b6) with 33 places.
    • Uni K Wax (unikwax.com, SG_BRAND_eb209fcc90a7b9d) with 32 places.
    • Just-a-Cut (justacutsalons.com, SG_BRAND_1c0095372fd2dcc6) with 28 places.
    • Zero's Subs (zerossub.com, SG_BRAND_2f67ea94f34c82aa) with 27 places.
    • The Comfy Cow (thecomfycow.com, SG_BRAND_1fe673b225ac9f06) with 5 places.
    • Wichcraft (wichcraft.com, SG_BRAND_654e11e75c9fb0f9) with 5 places.
    • 1 Hotels (1hotels.com, SG_BRAND_6462594f4a598a94) with 4 places.
    • B Hotels & Resorts (bhotelsandresorts.com, SG_BRAND_2dd7e7efd007f82c) with 4 places.
    • Cleats (cleatswings.com, SG_BRAND_5778688562842d4a) with 4 places.
    • And over 130 more!! Holy :cow:

Bug Fixes and Known Issues - Core Places and Brands

  • Changed brand name Dunkin' Donuts :arrow-forward: Dunkin' (SG_BRAND_9b4045db0fbb461cf9ed78916d9b16b4) :coffee:

  • We corrected some errors involving either over- or under-labeling POI for some brands. In some cases we were creating branded POI incorrectly at some locations. In other cases we were missing locations. These fixes resulted in significant changes in the total number of POI for those affected brands. The new count is correct, and for transparency we'd like to list some of these fixes as examples in no particular order.

  • Stoney River Steakhouse and Grill, (SG_BRAND_500ca3fab748506108763573b32c241a). Net POI count change: -29. Bug: Sept incorrectly included POI for J. Alexanders subsidiaries.

  • Hyatt Place, (SG_BRAND_23471c0e8e8cd0b37e02cc1ec7b54910). Net POI count change: 215. Bug: Due to a data sourcing error, Aug/Sept data was missing locations.

  • Hampton, (SG_BRAND_b6766b490c59a423e6011e11abb0dfba). Net POI count change: -1071. Bug: Incorrectly included other Hilton subsidiary brands.

  • Wyndham Garden, (SG_BRAND_521a725e4798f7d8). Net POI count change: +14. Bug: now have complete coverage (maybe missing 1). BUT, tradeoff was that we had 1 location in CA that wasn't in newer file. Now 0 in CA (awaiting separate CA file)

  • Homewood Suites by Hilton, (SG_BRAND_7ccd77936c08cd1bd6e26a5ee386cf07). Net POI count change: US: +47, CA: -128. Bug: Data source error caused under-sourcing for US and over-sourcing for CA)

  • Abbey Carpet and Floor, (SG_BRAND_faa9096ceee3f8943ec22bde486bd15f). Net POI count change: 82. Bug: Due to a data sourcing error, Aug & Sept releases missing locations.

  • KinderCare, (SG_BRAND_bb687a854c6bcb1). Net POI count change: +488. Bug: Due to a data sourcing error Aug/Sept data was missing locations.

  • MiniLuxe, (SG_BRAND_e1e142a56327163). Net POI count change: +12. Bug: Due to a data sourcing error, previously only included Boston locations.

  • National Bank, (SG_BRAND_40ff3f31adb8051). Net POI count change: -2474. Bug: Incorrectly included ATMs

  • Scotia Bank, (SG_BRAND_40dca7b8ae2781f0). Net POI count change: -571. Bug: Incorrectly included ATMs

  • Tommy Hilfiger, (SG_BRAND_1428e3360a85e36f654fcd0166e3e607). Net POI count change: 155. Bug: Due to a data sourcing error we were missing many Tommy locations.

  • lululemon athletica, (SG_BRAND_44427b89ae7ee3ac12514dd4cc220a1c). Net POI count change: +24. Bug: Due to a data sourcing error, previous release was missing locations.

  • Bad SGPID Churn -- Bad sgpid churn are undesired failures to maintain consistent safegraph_place_ids (sgpids) between releases (see discussion in March 2019 release). We internally track and estimate our performance in this domain and share these numbers in our release notes for maximum transparency. In this release:

    • We dropped 164,994 sgpids (44,202 branded and 120,929 non-branded).
    • We added 1,400,355 sgpids (86,054 branded and 1,314,301 non-branded).
    • After the net ~1,235,000 new places, some percent of the remaining 165k churned sgpids are true openings and closings; the remainder are bad sgpid churn. We continue to work on better metrics to distinguish these cases and better solutions for minimizing sgpid churn release-to-release.
  • Category Fill Rate We monitor category fill rate with 3 metrics: (1) category fill rate across the entire dataset, (2) category fill rate for branded POI, (3) category fill rate in the brand_info file (brand-level categories). We want all of these numbers to be 100%.

    • (1) All POI category fill rate. Last month 92%. This month 89%. :thumbsdown:
    • (2) Branded POI category fill rate. Last month 100%. This month 100% :100:
    • (3) Brand-level category fill rate (brand_info file). Last month 100%. This month 100%. :100:

Enhancements - Geometry

  • Improved and additional cartography and polygons. New or improved polygon geometries for over 18,000 in US and CA including DMVs, elementary and secondary schools, colleges, universities, and more. POI :diamond-shape:
1224

New Orleans DMV (sg:3ef96a9ad2c54948977f83804c150a 0c)
100 Veterans Blvd
New Orleans LA
70124

One of over 18,000 new and improved polygons in the October 2019 release of SafeGraph Places

Note: SafeGraph polygons accurately describe the base of a POI's structure-footprint. Don't let angled satellite views and the roofs of tall buildings fool you! We know that you know your trigonometry better than that.

Bug Fixes and Known Issues - Geometry

  • Centroid-Radius Polygons -- As discussed in March 2019 release notes. We internally track centroid-radius polygons vs precise polygons and strive for 100% precise polygons. You can measure this yourself using the is_synthetic column. This release, we've decreased to 94.6% precise polygons (down from 95.0 last month) :chart-with-downwards-trend:. This is due to the large overall increase in new Places (net + 1.2 MM new places). We are not entirely keeping up with our precise polygons. Here is how we are tracking on this metric since January 2019: Centroid-Radius vs Precise Polygon Tracking.
  • Percent polygon_class = OWNED. Our Geometry customers care deeply about reducing the number of SHARED polygons and increasing the number of OWNED polygons. polygon_class is a nuanced Geometry attribute, so please consult our Geometry docs page to refresh your memory on the precise definition. One of the key ways that SafeGraph internally tracks progress on improving the Geometry product is by examining polygon_class for all safegraph_place_ids that are both (i) branded and (ii) do NOT have a parent_safegraph_place_id; we call this group "branded, no-parent". We want 100% of "branded, no-parent" POI to have polygon_class = OWNED_POLYGON. We are working hard, but we have a long way to go. In the interest of full transparency, we will report this metric every month. You can measure this yourself in the query below :arrow-down: .
    • Last month, the percent OWNED polygons for branded, no-parent was 72.2%. This month it is 71.0%. :thumbsdown: This is the wrong direction. Adding 1.2 MM POI in this release ultimately caused this percent to go down. Our efforts to improve this metric buffered this down-tick but did not wholly overcome it. We will try even harder next month, and continue to strive for 100%. Here is how we are tracking on this metric in recent releases: OWNED vs SHARED Polygons in SafeGraph Places Release History.
-- How to calculate the number of OWNED polygons for the group "branded, no-parent"
SELECT polygon_class, COUNT(*) as num_poi
FROM safegraph_places
WHERE brands IS NOT NULL 
AND parent_safegraph_place_id IS NULL
GROUP BY 1
ORDER BY 1
OWNED_POLYGON	649143
SHARED_POLYGON	264602

Enhancements - Patterns

  • Last month SG Patterns had 3,177,342 points-of-interest (US only). This month SG Patterns has 3,583,579 points-of-interest (US only) (net + 406,237 places) . :exclamation: :chart-with-upwards-trend:
  • Last month SG Patterns had 1,041,517,353 visits from 43,304,473 visitors. This month SG Patterns has 1,098,364,782 visits from 48,874,591 visitors (delta + 56,847,429 visits, + 5,570,118 visitors). :exclamation: :chart-with-upwards-trend:
  • As discussed above in Breaking Changes, we have improved an aspect of our visits attribution methodology that affects all Patterns columns. Previously, we excluded any POI from the Patterns product if it was near a POI with a relatively much higher social media ranking. Based on your input and recent internal examinations, we've decided this exclusion detail in our methodology creates too much instability in the Patterns product month-to-month and has not achieved the desired intent. So, we are removing this exclusion. This means that when comparing October Release Patterns to prior versions of Patterns, there will be changes in visit counts (and the appearance of new POI in the Patterns product) that do not reflect changes in the real world but rather are due to the change in methodology. In light of this, we are delivering our current enterprise customers a one-time backfill of historical Patterns data on the new algorithm to provide you a more stable view of Patterns data across time. See Breaking Changes above for details on the backfill-directory structure.

Also check out these new ways to get SafeGraph data: 
  * Need some extra data on other SafeGraph products? Check out the [SafeGraph Data Bar.](https://shop.safegraph.com/) 
  * Are you an Esri or ArcGIS user? Check out our FREE data [SafeGraph Places in the Esri Marketplace](https://marketplace.arcgis.com/listing.html?id=3425348e4bee4059af2b353e52df43c2).
  * Or just drop us a line! Your data needs are our data delights!