May-2019 Release Notes (v2019-04-30)

April showers bring May Flowers 🌻 🌻 🌻

These May 2019 Release Notes are SafeGraph's longest Release Notes yet!

Enhancements - Core Places and Brands

  • Last month SG Places had 4,779,045 places. This month SG Places has 4,683,512 places (net - 95,533 places) (see next bullet)
  • Better Open/Close tracking. We removed ~ 100k long-tail non-branded small businesses from SafeGraph Places that have closed sometime in the last 6 months but were previously eluding our filters. This caused an overall decreases in total POI count 📉 but an increase in overall POI accuracy. 📈
  • We've added net 133 new brands 🎊 including:

Bugs and Known issues - Core Places and Brands

  • NAICS Codes
    (1) Fixed a bug that was causing some places of worship (e.g. churches) to be miscategorized as restaurants and other miscellaneous NAICS code. The correct NAICS code is 813110.
    (2) Fixed a separate bug that was causing us to miscategorize some non-branded small-business restaurants as doctor's offices and other miscellaneous NAICS codes. The (correct) most common NAICS codes for restaurants are 722511 and 722513.
    (3) Fixed incorrect NAICS code for Dig Inn (diginn.com, SGBRAND_c22278870ba834501cbd8412ba9e234c) _thank you to our excellent partner who noticed this error and submitted a correction request.

  • Bad SGPID Churn -- Bad sgpid churn are undesired failures to maintain consistent safegraph_place_ids (sgpids) between releases (see discussion in March 2019 release). We internally track and estimate our performance in this domain and share these numbers in our release notes for maximum transparency. In the May-2019 release:

    • We dropped 231,327 sgpids (40,475 branded and 190,852 non-branded).
    • We added 135,794 sgpids (49,460 branded and 86,334 non-branded).
    • Some percent of these are true openings and closings; the remainder are bad sgpid churn. We are working on better metrics for distinguishing the two cases.
    • NB: These numbers are significantly improved from previous April release
  • Category Fill Rate We monitor category fill rate with 3 metrics: (1) category fill rate across the entire dataset, (2) category fill rate for branded POI, (3) category fill rate in the brand_info file (brand-level categories). We want all of these numbers to be 100%.

    • (1) All POI category fill rate. Last month 91%. This month 89%. 👎 We will do better next time.
    • (2) Branded POI category fill rate. Last month 100%. This month 100% 💯
    • (3) Brand-level category fill rate (brand_info file). Last month 99%. This month 100%. 💯

Enhancements - Geometry

  • Improved and additional cartography and polygons. New or improved polygon geometries :diamond-shape: including over 5,000 used auto-dealerships (continued from last month) and better maps for strip malls at over 7,500 affecting over 50,000 places.
  • For example, there is a strip mall located at 3132 East Camelback Rd, Phoenix, AZ 85016 with several important places including a Safeway grocery store (see figure below)(sg:b3662d0a221448f1adedaf6aa1524de5, SG_BRAND_7cde1ae4542b0a76a2e5efeccc69e55f). In the April release, all of the POI in this strip mall have the same SHARED polygon_wkt. Now, in the May release, all of these businesses have precise OWNED polygon_wkt. Improvements like this were made for over 7,500 strip malls affecting over 50,000 Places.
  • Thanks to the above efforts, the overall count of places with polygon_class = SHARED went down from 1.839 M in April to 1.732 M in May. ❗ 📉 👍
1632

Example of improved polygon accuracy at strip mall located at 3132 East Camelback Rd, Phoenix, AZ 85016. The Green polygon shows the polygon_wkt for the Safeway grocery store (sg:b3662d0a221448f1adedaf6aa1524de5). This is one of over 7500 strip malls that have been improved in the May 2019 release.

New columns - Geometry:

includes_parking_lot

Column NameDescriptionTypeExample
includes_parking_lotWhether or not the polygon includes the parking lot or just the building.Booleanfalse

Based on customer feedback we are now configuring some of our polygon_wkt geometries to include the parking lot in some cases. For example see Enhancements - Geometries from April Release Notes. The value of the new column includes_parking_lot column is to make explicit to our customers when the polygon_wkt does or does not include the parking lot. There are three possible values true, false, and null (null when we are not sure whether a parking lot is included in the geometry). In the May release the breakdown of this new column is as follows:

%sql
SELECT includes_parking_lot, COUNT(*) as num_polygons
FROM safegraph_places_may2019
GROUP BY 1
ORDER BY 2 DESC

results:

has_parking_lotnum_polygons
false4,061,320
null527,647
true94,545

Bugs and Known issues - Geometry

We are excited to announce that we are adding two new columns to Patterns as well as additional info to the Visits Panel Summary. We received a lot of feedback that these new features will enable customers to get even more value from the data.

The first new column visits_by_day gives visibility into the number of visits during each day of the month. This will help users of the data who want to know if there was an upswing in visitors right after a promotion or due to a major event. This will also help users analyze the data for companies who have fiscal year ends that do not align with the exact end of a month.

The second new column bucketed_dwell_times provides further detail for those looking at the length of visits. median_dwell_time already provides the median length of visits (and this column is still available). But the new bucketed_dwell_times column gives deeper insight into whether visitors to a location are just stopping by, hanging around or in it for the long haul.

Lastly, our Visits Panel Summary will now include a new row showing the total visitors seen in the month in all states. This will help users of the data quickly understand the size of the panel in any given month.

New columns - Patterns:

visits_by_day

Column NameDescriptionTypeExample
visits_by_dayThe number of visits to the POI each day (local time) over the covered time period.JSON [Integer][33, 22, 33, 22, 33, 22, 22, 21, 23, 33, 22, 11, 44, 22, 22, 44, 11, 33, 44, 44, 44, 33, 34, 44, 22, 33, 44, 44, 34, 43, 43]
  • This is an array of visits on each day in the month.
  • We are breaking up days based on local time.
  • Because our one-month snapshot is using UTC time, and we represent days in local time, the last day of the month is cut off. For instance, California PST is 8 hours behind UTC and California PDT is 7 hours behind UTC. This means that during Daylight Savings Time, the last day in the array is missing the last 7 hours of the day in local time (between 5 pm and midnight).

bucketed_dwell_times

Column NameDescriptionTypeExample
bucketed_dwell_timesKey is range of minutes and value is number of visits that were within that duration.JSON {String: Integer}{ "<5": 40, "5-20": 22, "21-60": 45, "61-240": 3,">240": 5}
  • This is a dictionary of different time spans and the number of visits that were of each duration.
  • The time spans are in minutes.

Addition to Visits Panel Summary:

ALL STATES row to show total visitors seen in the month (might differ from the sum of visitors by state due to individual visitors having visits in multiple states).

After these changes to Patterns are finalized in shop.safegraph.com over the next few days, then we will update the official schema documentation for Patterns.