April-2019 Release Notes (v2019-03-29)

We know these improvements are amazing, and we'd like to assure you this is not an April Fool's joke.

Core Places and Brands

Enhancements - Core Places and Brands

  • Last month SG Places had 4,774,401 places. This month SG Places has 4,779,045 places (net + 4,644 places). 📈

  • We've added 381 new brands 🎊 including:

    • ampm (ampm.com, SG_BRAND_db57f3767efde48f) with 1103 places.
    • Hardee's Red Burrito (hardees.com/redburrito, SG_BRAND_a9fd3c6e57be83ce) with 296 places.
    • Rosati's Chicago Pizza (rosatispizza.com, SG_BRAND_944d5e24cfc93afa) with 193 places.
    • Goodyear Commercial Tire & Service Center (goodyearctsc.com, SG_BRAND_7bc3d0016a136896) with 173 places.
    • and 377 more!!
  • Significantly improved entity resolution and de-duplication 🔀

    • Impact: This improved technology led to the discovery and removal of ~ 160,000 duplicate POI from our dataset. We also discovered ~120,000 POI that were being incorrectly merged but are actually distinct places.
    • Details: Figuring out if two data records from two different sources refer to the same place is one of the core challenges we work on at SafeGraph. We’ve made multiple improvements on our deduplication technology in this release. The largest improvements in accuracy came from better feature engineering — especially for comparison features between POI names. For example, our original model struggled with POI names which looked similar at the front but different at the back, e.g. “AT&T” and “AT&T Authorized Retailer”. Now our features recognize that these POI names are relatively similar, even though the latter includes significantly more letters than the former.
  • Brand Names are now case-smart and canonical
    As known to all fastidious SafeGraph customers (which are almost all SafeGraph customers), historically SafeGraph brand names and branded-location names have not always been formatted in a standardized manner. (We've all seen a location for SG_BRAND_f116acfe9147494063e58da666d1d57e starbucks right next to a location for SG_BRAND_8e66c99aa833dd0ced592ee5ba50e743 EILEEN FISHER and wondered... why is one lower case and the other upper case? :owlbert-thinking: ). Now all brand names are case-smart and canonical (i.e. the name and casing a consumer would expect). 131 brands changed their names, the full list of changes is documented here..

SELECT
  M.safegraph_brand_id, 
  M.brand_name as brand_name_March2019, 
  A.brand_name as brand_name_April2019
FROM brand_info_march2019 M 
LEFT JOIN brand_info_april2019 A
  ON M.safegraph_brand_id = A.safegraph_brand_id
WHERE M.brand_name <> A.brand_name
ORDER BY RAND()
LIMIT 6

Results:

safegraph_brand_idbrand_name_March2019brand_name_April2019
SG_BRAND_64a77880c7f7c1d3133d10e574c97a8bkohlsKohl's
SG_BRAND_962f9b1d1de0bf5b87f4782eafcfd5e5wendysWendy's
SG_BRAND_b581ece69c7ca08c57e57d8aa919224dL'OCCITANEL'Occitane
SG_BRAND_2c9fcf03e737a9c4f882534ef6a57b8cBALLSTON SPA NATIONAL BANK (BSNB)Ballston Spa National Bank (BSNB)
SG_BRAND_24fdc423822298896dcd7ae0548f1498UNIQLO USUniqlo
SG_BRAND_3d459942728f7a636ce726527858d8f8SAINT LAURENT PARISSaint Laurent

Bugs and Known issues - Core Places and Brands

  • Bad SGPID Churn -- Bad sgpid churn are undesired failures to maintain consistent safegraph_place_ids (sgpids) between releases (see discussion in March 2019 release. We internally track and estimate our performance in this domain and share these numbers in our release notes for maximum transparency. In the April-2019 release
    • We dropped 295,132 sgpids (127,945 branded and 167,187 non-branded).
    • We added 299,776 sgpids (89,542 branded and 210,234 non-branded).
    • Some percent of these are true openings and closings; the remainder are bad sgpid churn. We are working on better metrics for distinguishing the two cases.
    • NB: These numbers are much higher than last month due to (a) the improved de-duplication described above and (b) an internal overhaul and re-factoring of some of our most important sources of data for branded POI. Despite our best efforts, this refactor caused more instability than our average release.
  • Category Fill Rate We monitor category fill rate with 3 metrics: (1) category fill rate across the entire dataset, (2) category fill rate for branded POI, (3) category fill rate in the brand_info file (brand-level categories). We want all of these numbers to be 100%.
    • (1) All POI category fill rate. Last month 91%. This month 91%.
    • (2) Branded POI category fill rate. Last month 98%. This month 100% 💯
    • (3) Brand-level category fill rate (brand_info file). Last month 84%. This month 99%. 📈

Geometry

Enhancements - Geometries

  • Improved and additional cartography and polygons. New or improved polygon geometries :diamond-shape: for over 10,000 POI, including many new- and used-auto dealerships. The goal is for auto dealerships to consistently include the outdoor lot areas since this is more representative of the place of interest than just the indoor building (and this is the preferred polygon for visit attribution use cases). We are not finished with dealerships but we made significant progress this month

For example, Auto Ranch 311 G Street North West, Ardmore, OK, 73401(sg:c76ee3bfa5a8440b804250d1f0fe52c0) is a small used-car dealership. SafeGraph Places polygon_wkt now includes the entire lot for this business instead of just the smaller building.

942

This shows how the polygon_wkt for Auto Ranch, a used-car dealership (sg:c76ee3bfa5a8440b804250d1f0fe52c0) has been redrawn to include the lot.

Bugs and Known issues - Geometries

  • Centroid-Radius Polygons -- As discussed in March 2019 release notes. We internally track centroid-radius polygons vs precise polygons and strive for 100% precise polygons.
    Here is how we are tracking on that metric over the last few releases.
SafeGraph Places VersionPercent Precise Polygons
April 2019 (v2019-03-29)92.8
March 2019 (v2019-02-28)92.8
February 2019 (v2019-01-30)92.7
January 2019 (v2018-12-20)90.9