July-2023 Release Notes (07/05/23)
Welcome to the July 2023 Release Notes! π π π
Fireworks all around! And it feels so right for this release. We've got millions of new POI and we're introducing our new accuracy framework in the US!
(2023-06-29/1688025607
shipped on 2023-07-05)
Highlights
- +2.1M new POIs across the globe π
- Mostly growth in the US. Happy 4th! πΊπΈ
- +479 new brands across 129 countries π
- Improved Standards for POI validation
- New Accuracy framework implemented
website
available for 4.2M non-branded POI π₯οΈ
SafeGraph prioritizes data quality above all else. It's what we strive to be known for. We can always get better, and with this release, we confidently say that we have!
In working with so many of our customers, we are constantly attempting to help them understand the validity and accuracy of their data. And in doing so, we've developed a framework we are using to track and communicate our work. We hope this will help our customers better understand what aspects of accuracy matter most to their use case and make the most of all of their data, not just ours.
In the spirit of transparency, we're sharing it all. Our Accuracy Docs provide detailed write ups on how we approach the difficult problems posed by trying to accurately grow at high speeds. You can see our first blog post on the topic here.
We've actively begun applying this framework in the US - our home market. See below for some highlights on the early returns!
Recall: US Highlights
We're aggressively looking to expand our dataset in all categories. So many workflows require "complete" coverage in order to be valuable. And while no perfect dataset exists, we're striving to make ours as complete as possible.
This month we added over +4M new POI in the US.
Here's a look at the categories where we grew coverage the most this month:
Healthcare
We added over 3.1M POI in Healthcare and Social Assistance (the 62 NAICS family). π₯π§ββοΈπ©Ίπ¦·π€
Here's a breakdown of the largest subcategories:
- +994,577 in Offices of Physicians (except Mental Health Specialists) (621111)
- +993,679 in Offices of All Other Miscellaneous Health Practitioners (621399)
- +713,942 in Offices of Physicians, Mental Health Specialists (621112)
- +221,731 in Offices of Dentists (621210)
- +87,477 in Offices of Chiropractors (621310)
- +53,306 in Offices of Optometrists (621320)
Additional Categories:
- +53,718 in Restaurants and Other Eating Places (7225) π§βπ³
- +46,900 in Offices of Real Estate Agents and Brokers (531210) π’
- +23,905 in Convenience Stores (445120) πͺ
- +23,600 in Beauty Salons (812112) π
Precision: US Highlights
With aggressive growth, we need to be equally aggressive in monitoring the validity of all that data. We've introduced additional new methods of verification, added new priority sources, and trained new models. The output of that work is included in this release, and we have meaningfully raised our threshold for what we are requiring in order to consider a place valid.
Not often are companies celebrating what is getting removed from a places dataset. But we are! In this releases, we removed ~2M POI in the US that no longer meet our criteria for validity. Some of these POI were representing places that didn't exist. Some were merely duplicates. Many were businesses that were actually just closed a long time ago.
To help better illustrate this, here are a few examples of unconfirmed POI that were removed:
location_name | street_address | postal_code |
---|---|---|
Phantom of the Opera | 222 W 45th St | 10036 |
999 Junk | 12460 E 79th St | 46236 |
Ryland Homes | 10110 Oak Motte Dr | 77494 |
Dish Network 24 Hr Sales | 2006 N Meade St | 54911 |
We'll host this Accuracy Metrics page on our Docs site where we will share our assessments externally. It will be updated each month with metrics derived from sampling random zip codes that are representative of different pockets of our data.
SafeGraph evaluates the quality and completeness of our US POI data using two key metrics:
- Coverage Rate: The percentage of real and open SafeGraph POIs compared to the industry standard. In the US, this is Google.
- Real Open Rate: The percent of SafeGraph POIs we claim to be real and currently open compared to those that actually are real and open.
This month, we kept it close to home with the two sampled zips because we like to think that accuracy runs deep in our roots.
- 94103: San Francisco, CA (the zip code of the first ever SafeGraph office, somewhat ironic now that we are fully remote) - Representative of Urban
- 98110: Bainbridge Island, WA (the lowest population density zip code that a SafeGrapher calls home) - Representative of Rural
Here's the summary (full breakdown on the Metrics page):
Geo Aggregation | Coverage Rate | Real Open Rate |
---|---|---|
Total Sample | 79% | 66% |
94103 (Urban) | 86% | 62% |
98110 (Rural) | 60% | 79% |
Check out our Accuracy Metric Methodologies for more detailed info on how we arrived at these calculations.
Interested in a particular zip code for a future month? We're taking suggestions!
Growth
Enhancements
SafeGraph is delivering more POI each month in countries around the world. This month, SG Places has a grand total of 49,199,271, including POI with or without geometry, closed POI, and parking lots. This is a net increase of 2,190,325 places from last month, including
- +1,887,656 in US πΊπΈ
- +52,914 in DE π©πͺ
- +45,449 in GB π¬π§
- +40,733 in PL π΅π±
- +26,463 in JP π―π΅
Of course, you can always visit our Places Summary Stats to find more details on our continued growth.
website
attribute available for non-branded POI
website
attribute available for non-branded POILast month we introduced a website
column for branded POI to provide more granular URLs for individual locations. This month, we have begun rolling out that column for non-branded POIs. In this release, we have 4.2M non-branded POIs with a validated, non-null value for website
(approximately 11% of all non-branded POIs). This fill rate will continue to rise steadily over subsequent months.
Here's a look at the top categories receiving new websites this month:
- +286,704 in Full-Service Restaurants (722511) π½οΈ
- +146,512 in Services to Buildings and Dwellings (5617) π§½π«§
- +138,434 in Offices of Physicians (except Mental Health Specialists) (621111) π§ββοΈ
- +135,435 in Hair, Nail, and Skin Care Services (81211) π
- +123,862 in Automotive Repair and Maintenance (8111) π§βπ§
European Warehouses
We're also up for trying to find more non-traditional POIs around the globe. This month, we were able to add 83k warehouses in multiple European countries for a specific customer request. These include:
- +38,298 in DE π©πͺ
- +25,844 in GB π¬π§
- +17,069 in PL π΅π±
Brands
We've added a grand total of 479 brands across 129 countries. Some of the highlights:
Finance Brands π¦
We added over 50 brands related to Financial Services while also expanding the footprint of some of our prior brands into new geographies. Here's a few of the new brands we added:
- Credit Agricole Banque PriveΓ© (SG_BRAND_ced4c4a0c4f6a14e)
- AXA (SG_BRAND_789cf3f3921aec0f)
- Krunkthai Bank (SG_BRAND_f52a163aa216b9e4)
- Garanti BBVA (SG_BRAND_7c6b248b7292b272)
- ANZ (SG_BRAND_831fc47ae53aebb0)
π Are we missing a brand or country? π Please let us know here!
Brand Openings and Closings
- We rely on POI metadata to track store openings and closings, and we are especially interested in understanding open/close dates for branded POIs. It can take more than a month to infer open/close dates, so we report brand open/close metrics on a one month delay.
- In this release, we flagged 1,857 brands with at least one store closure in May 2023 and 2,067 brands with at least one store opening in May 2023:
Learn more about our open/close columns here
Drops β¬οΈ
- We are ingesting many sources and due to source changes and processing changes, Placekeys do drop over time. In this release, we dropped 1,897,058 Placekeys (40,467 branded and 1,856,591 non-branded).
- Nearly all of these are due to our precision enhancements to verification and subsequent removal of now unconfirmed POI.