The SafeGraph Developer Hub

Welcome to the SafeGraph developer hub. You'll find comprehensive guides and documentation to help you start working with SafeGraph as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Data Science Resources

This is a repository of tutorials and demonstrations to help the data scientist get value faster with SafeGraph data.

See also, the Awesome SafeGraph Data Science Github list for technical resources for working with SafeGraph data provided by the broader data science community.

Panel Normalization for Longitudinal Analysis, Sampling Bias Corrections, and Extrapolation

Common Use Cases with SafeGraph Data

Topic
Blog Resource
Demo Resource

Calculate Point-Of-Interest Square Feet From SafeGraph Geometry

Get Started with Core Places in Python

De-clutter Your Maps With Simple Outlier Filtering

Correlate 1st-Party Store Data with SafeGraph Patterns

Quickly Visualize a Census Block Group On a Map

Visualize Retail Catchment Areas with SafeGraph Patterns

Quickly Visualize SafeGraph Data On A Map

Use polygons from Geometry to do Visit Attribution

Integrating SafeGraph with Data Science Platforms

Topic
Blog Resource
Demo Resource

Building Reliable Data Pipelines for Machine Learning (Webinar)

Calculating Diffs

At SafeGraph, we are constantly working to improve the veracity of our data. This includes broad, sweeping changes with high product impact as well as small, piecemeal changes that may only improve a few rows or columns at a time.

To identify the specific records that have been added, deleted, or changed from one release to the next, we've compiled a series of SQL commands to assist.

For each SQL code block, the "old release" file is named r1, and the new release that r1 is compared against is named r2. As long the files you wish to compare match this naming convention in your work environment, then you should be able to copy these commands directly and run.

Which rows were added?

select r2.safegraph_place_id as added_records
from r1
right join r2
on r1.safegraph_place_id = r2.safegraph_place_id
where r1.safegraph_place_id is null

--The output is all new safegraph_place_ids that did not exist in the
--previous file

Count of total records added:

select count(*) 
from r1
right join r2
on r1.safegraph_place_id = r2.safegraph_place_id
where r1.safegraph_place_id is null

--The output is the total number of records (safegraph_place_id) 
--that did not exist in the previous file

Which rows were deleted?

select r1.safegraph_place_id as deleted_records
from r1
left join r2
on r1.safegraph_place_id = r2.safegraph_place_id
where r2.safegraph_place_id is null

--The output is all of the safegraph_place_ids that were in the 
--previous file but do not exist in the new file

Count of total records deleted:

select count(*)
from r1
left join r2
on r1.safegraph_place_id = r2.safegraph_place_id
where r2.safegraph_place_id is null

--The output is the total number of records (safegraph_place_id)
-- that do not exist in the current file but existed previously

Which rows had a column value change?

Please note that some columns are much more likely to experience high volume change than others. For example, millions of latitude and longitude columns could change from one release to the next simply due to a geocoding precision update. However, if iso_country_code is changing by the hundreds or thousand, then there are bigger issues at hand.

select r2.safegraph_place_id, r1.safegraph_subcategory as old_attribute_value,  r2.safegraph_subcategory as new_attribute_value
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.safegraph_subcategory <> r2.safegraph_subcategory

--This outputs rows with a specific column change. This example shows
--changes to the sub_category column. To view other column changes, 
--replace every instance of "sub_category" with the other attribute you 
--would like to view changes for.

Here's a longer code block returning all rows that saw a column value change across all Core and Geometry columns (excluding opened_on closed_on tracking_opened_since tracking_closed_since):

select r1.safegraph_place_id as old_safegraph_place_id, r2.safegraph_place_id as new_safegraph_place_id, r1.parent_safegraph_place_id as old_parent_safegraph_place_id, r2.parent_safegraph_place_id as new_parent_safegraph_place_id, r1.location_name as old_location_name, r2.location_name as new_location_name, r1.safegraph_brand_ids as old_safegraph_brand_ids, r2.safegraph_brand_ids as new_safegraph_brand_ids, r1.brands as old_brands, r2.brands as new_brands, r1.top_category as old_top_category, r2.top_category as new_top_category, r1.sub_category as old_sub_category, r2.sub_category as new_sub_category, r1.naics_code as old_naics_code, r2.naics_code as new_naics_code, r1.latitude as old_latitude, r2.latitude as new_latitude, r1.longitude as old_longitude, r2.longitude as new_longitude, r1.street_address as old_street_address, r2.street_address as new_street_address, r1.city as old_city, r2.city as new_city, r1.region as old_region, r2.region as new_region, r1.postal_code as old_postal_code, r2.postal_code as new_postal_code, r1.iso_country_code as old_iso_country_code, r2.iso_country_code as new_iso_country_code, r1.phone_number as old_phone_number, r2.phone_number as new_phone_number, r1.open_hours as old_open_hours, r2.open_hours as new_open_hours, r1.category_tags as old_category_tags, r2.category_tags as new_category_tags, r1.wkt as old_wkt, r2.wkt as new_wkt, r1.polygon_class as old_polygon_class, r2.polygon_class as new_polygon_class, r1.includes_parking_lot as old_includes_parking_lot, r2.includes_parking_lot as new_includes_parking_lot, r1.is_synthetic as old_is_synthetic, r2.is_synthetic as new_is_synthetic
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and (r1.parent_safegraph_place_id <> r2.parent_safegraph_place_id 
or r1.location_name <> r2.location_name 
or r1.safegraph_brand_ids <> r2.safegraph_brand_ids 
or r1.brands <> r2.brands
or r1.top_category <> r2.top_category 
or r1.sub_category <> r2.sub_category
or r1.naics_code <> r2.naics_code 
or r1.latitude <> r2.latitude 
or r1.longitude <> r2.longitude 
or r1.street_address <> r2.street_address
or r1.city <> r2.city 
or r1.region <> r2.region
or r1.postal_code <> r2.postal_code 
or r1.iso_country_code <> r2.iso_country_code 
or r1.phone_number <> r2.phone_number
or r1.open_hours <> r2.open_hours 
or r1.category_tags <> r2.category_tags 
or r1.wkt <> r2.wkt 
or r1.polygon_class <> r2.polygon_class 
or r1.includes_parking_lot <> r2.includes_parking_lot 
or r1.is_synthetic <> r2.is_synthetic)

Count of all column value changes (excluding opened_on closed_on tracking_opened_since tracking_closed_since):

select count(*) as number_of_changed_records, 'parent_safegraph_place_id' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.parent_safegraph_place_id <> r2.parent_safegraph_place_id
union
select count(*) as number_of_changed_records, 'location_name' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.location_name <> r2.location_name
union
select count(*) as number_of_changed_records, 'safegraph_brand_ids' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.safegraph_brand_ids <> r2.safegraph_brand_ids
union
select count(*) as number_of_changed_records, 'brands' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.brands <> r2.brands
union
select count(*) as number_of_changed_records, 'latitude' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.latitude <> r2.latitude
union
select count(*) as number_of_changed_records, 'longitude' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.longitude <> r2.longitude
union
select count(*) as number_of_changed_records, 'street_adress' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.street_address <> r2.street_address
union
select count(*) as number_of_changed_records, 'city' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.city <> r2.city
union
select count(*) as number_of_changed_records, 'region' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.region<> r2.region
union
select count(*) as number_of_changed_records, 'postal_code' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.postal_code <> r2.postal_code
union
select count(*) as number_of_changed_records, 'iso_country_code' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.iso_country_code <> r2.iso_country_code
union
select count(*) as number_of_changed_records, 'top_category' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.top_category <> r2.top_category
union
select count(*) as number_of_changed_records, 'sub_category' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.sub_category <> r2.sub_category
union
select count(*) as number_of_changed_records, 'naics_code' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.naics_code <> r2.naics_code
union
select count(*) as number_of_changed_records, 'category_tags' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.category_tags <> r2.category_tags
union
select count(*) as number_of_changed_records, 'phone_number' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.phone_number <> r2.phone_number
union
select count(*) as number_of_changed_records, 'open_hours' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.open_hours <> r2.open_hours
union
select count(*) as number_of_changed_records, 'wkt' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.wkt <> r2.wkt
union
select count(*) as number_of_changed_records, 'polygon_class' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.polygon_class <> r2.polygon_class
union
select count(*) as number_of_changed_records, 'includes_parking_lot' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.includes_parking_lot <> r2.includes_parking_lot
union
select count(*) as number_of_changed_records, 'is_synthetic' as column_name
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and r1.is_synthetic <> r2.is_synthetic 

Which rows remained constant?

(excluding opened_on closed_on tracking_opened_since tracking_closed_since):

select r2.safegraph_place_id as safegraph_place_id 
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and (r1.parent_safegraph_place_id = r2.parent_safegraph_place_id 
and r1.location_name = r2.location_name 
and r1.safegraph_brand_ids = r2.safegraph_brand_ids 
and r1.brands = r2.brands
and r1.top_category = r2.top_category 
and r1.sub_category = r2.sub_category
and r1.naics_code = r2.naics_code 
and r1.latitude = r2.latitude 
and r1.longitude = r2.longitude 
and r1.street_address = r2.street_address
and r1.city = r2.city 
and r1.region = r2.region
and r1.postal_code = r2.postal_code 
and r1.iso_country_code = r2.iso_country_code 
and r1.phone_number = r2.phone_number
and r1.open_hours = r2.open_hours 
and r1.category_tags = r2.category_tags 
and r1.wkt = r2.wkt 
and r1.polygon_class = r2.polygon_class 
and r1.includes_parking_lot = r2.includes_parking_lot 
and r1.is_synthetic = r2.is_synthetic)

--Outputs all safegraph_place_ids that saw no column value change

Count of records (safegraph_place_id) with no column value change (excluding opened_on closed_on tracking_opened_since tracking_closed_since):

select count(*) as total_no_change 
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and (r1.parent_safegraph_place_id = r2.parent_safegraph_place_id 
and r1.location_name = r2.location_name 
and r1.safegraph_brand_ids = r2.safegraph_brand_ids 
and r1.brands = r2.brands
and r1.top_category = r2.top_category 
and r1.sub_category = r2.sub_category
and r1.naics_code = r2.naics_code 
and r1.latitude = r2.latitude 
and r1.longitude = r2.longitude 
and r1.street_address = r2.street_address
and r1.city = r2.city 
and r1.region = r2.region
and r1.postal_code = r2.postal_code 
and r1.iso_country_code = r2.iso_country_code 
and r1.phone_number = r2.phone_number
and r1.open_hours = r2.open_hours 
and r1.category_tags = r2.category_tags 
and r1.wkt = r2.wkt 
and r1.polygon_class = r2.polygon_class 
and r1.includes_parking_lot = r2.includes_parking_lot 
and r1.is_synthetic = r2.is_synthetic)

Master Matrix

Below is a command that returns all records from both r1 and r2 with an indication if the record was added (new to r2), deleted (not in r2), saw a column value change, or saw no changes (excluding opened_on closed_on tracking_opened_since tracking_closed_since):

WITH Adds AS (
select 'Add' as Delta, r1.safegraph_place_id as old_safegraph_place_id, r2.safegraph_place_id as new_safegraph_place_id, r1.parent_safegraph_place_id as old_parent_safegraph_place_id, r2.parent_safegraph_place_id as new_parent_safegraph_place_id, r1.location_name as old_location_name, r2.location_name as new_location_name, r1.safegraph_brand_ids as old_safegraph_brand_ids, r2.safegraph_brand_ids as new_safegraph_brand_ids, r1.brands as old_brands, r2.brands as new_brands, r1.top_category as old_top_category, r2.top_category as new_top_category, r1.sub_category as old_sub_category, r2.sub_category as new_sub_category, r1.naics_code as old_naics_code, r2.naics_code as new_naics_code, r1.latitude as old_latitude, r2.latitude as new_latitude, r1.longitude as old_longitude, r2.longitude as new_longitude, r1.street_address as old_street_address, r2.street_address as new_street_address, r1.city as old_city, r2.city as new_city, r1.region as old_region, r2.region as new_region, r1.postal_code as old_postal_code, r2.postal_code as new_postal_code, r1.iso_country_code as old_iso_country_code, r2.iso_country_code as new_iso_country_code, r1.phone_number as old_phone_number, r2.phone_number as new_phone_number, r1.open_hours as old_open_hours, r2.open_hours as new_open_hours, r1.category_tags as old_category_tags, r2.category_tags as new_category_tags, r1.wkt as old_wkt, r2.wkt as new_wkt, r1.polygon_class as old_polygon_class, r2.polygon_class as new_polygon_class, r1.includes_parking_lot as old_includes_parking_lot, r2.includes_parking_lot as new_includes_parking_lot, r1.is_synthetic as old_is_synthetic, r2.is_synthetic as new_is_synthetic
from r1
right join r2
on r1.safegraph_place_id = r2.safegraph_place_id
where r1.safegraph_place_id is null),

Deletes AS (
select 'Delete' as Delta, r1.safegraph_place_id as old_safegraph_place_id, r2.safegraph_place_id as new_safegraph_place_id, r1.parent_safegraph_place_id as old_parent_safegraph_place_id, r2.parent_safegraph_place_id as new_parent_safegraph_place_id, r1.location_name as old_location_name, r2.location_name as new_location_name, r1.safegraph_brand_ids as old_safegraph_brand_ids, r2.safegraph_brand_ids as new_safegraph_brand_ids, r1.brands as old_brands, r2.brands as new_brands, r1.top_category as old_top_category, r2.top_category as new_top_category, r1.sub_category as old_sub_category, r2.sub_category as new_sub_category, r1.naics_code as old_naics_code, r2.naics_code as new_naics_code, r1.latitude as old_latitude, r2.latitude as new_latitude, r1.longitude as old_longitude, r2.longitude as new_longitude, r1.street_address as old_street_address, r2.street_address as new_street_address, r1.city as old_city, r2.city as new_city, r1.region as old_region, r2.region as new_region, r1.postal_code as old_postal_code, r2.postal_code as new_postal_code, r1.iso_country_code as old_iso_country_code, r2.iso_country_code as new_iso_country_code, r1.phone_number as old_phone_number, r2.phone_number as new_phone_number, r1.open_hours as old_open_hours, r2.open_hours as new_open_hours, r1.category_tags as old_category_tags, r2.category_tags as new_category_tags, r1.wkt as old_wkt, r2.wkt as new_wkt, r1.polygon_class as old_polygon_class, r2.polygon_class as new_polygon_class, r1.includes_parking_lot as old_includes_parking_lot, r2.includes_parking_lot as new_includes_parking_lot, r1.is_synthetic as old_is_synthetic, r2.is_synthetic as new_is_synthetic
from r1
left join r2
on r1.safegraph_place_id = r2.safegraph_place_id
where r2.safegraph_place_id is null),

No_change AS (
select 'No Change' as Delta, r1.safegraph_place_id as old_safegraph_place_id, r2.safegraph_place_id as new_safegraph_place_id, r1.parent_safegraph_place_id as old_parent_safegraph_place_id, r2.parent_safegraph_place_id as new_parent_safegraph_place_id, r1.location_name as old_location_name, r2.location_name as new_location_name, r1.safegraph_brand_ids as old_safegraph_brand_ids, r2.safegraph_brand_ids as new_safegraph_brand_ids, r1.brands as old_brands, r2.brands as new_brands, r1.top_category as old_top_category, r2.top_category as new_top_category, r1.sub_category as old_sub_category, r2.sub_category as new_sub_category, r1.naics_code as old_naics_code, r2.naics_code as new_naics_code, r1.latitude as old_latitude, r2.latitude as new_latitude, r1.longitude as old_longitude, r2.longitude as new_longitude, r1.street_address as old_street_address, r2.street_address as new_street_address, r1.city as old_city, r2.city as new_city, r1.region as old_region, r2.region as new_region, r1.postal_code as old_postal_code, r2.postal_code as new_postal_code, r1.iso_country_code as old_iso_country_code, r2.iso_country_code as new_iso_country_code, r1.phone_number as old_phone_number, r2.phone_number as new_phone_number, r1.open_hours as old_open_hours, r2.open_hours as new_open_hours, r1.category_tags as old_category_tags, r2.category_tags as new_category_tags, r1.wkt as old_wkt, r2.wkt as new_wkt, r1.polygon_class as old_polygon_class, r2.polygon_class as new_polygon_class, r1.includes_parking_lot as old_includes_parking_lot, r2.includes_parking_lot as new_includes_parking_lot, r1.is_synthetic as old_is_synthetic, r2.is_synthetic as new_is_synthetic
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and (r1.parent_safegraph_place_id = r2.parent_safegraph_place_id 
and r1.location_name = r2.location_name 
and r1.safegraph_brand_ids = r2.safegraph_brand_ids 
and r1.brands = r2.brands
and r1.top_category = r2.top_category 
and r1.sub_category = r2.sub_category
and r1.naics_code = r2.naics_code 
and r1.latitude = r2.latitude 
and r1.longitude = r2.longitude 
and r1.street_address = r2.street_address
and r1.city = r2.city 
and r1.region = r2.region
and r1.postal_code = r2.postal_code 
and r1.iso_country_code = r2.iso_country_code 
and r1.phone_number = r2.phone_number
and r1.open_hours = r2.open_hours 
and r1.category_tags = r2.category_tags 
and r1.wkt = r2.wkt 
and r1.polygon_class = r2.polygon_class 
and r1.includes_parking_lot = r2.includes_parking_lot 
and r1.is_synthetic = r2.is_synthetic)
)

select 'Change' as Delta, r1.safegraph_place_id as old_safegraph_place_id, r2.safegraph_place_id as new_safegraph_place_id, r1.parent_safegraph_place_id as old_parent_safegraph_place_id, r2.parent_safegraph_place_id as new_parent_safegraph_place_id, r1.location_name as old_location_name, r2.location_name as new_location_name, r1.safegraph_brand_ids as old_safegraph_brand_ids, r2.safegraph_brand_ids as new_safegraph_brand_ids, r1.brands as old_brands, r2.brands as new_brands, r1.top_category as old_top_category, r2.top_category as new_top_category, r1.sub_category as old_sub_category, r2.sub_category as new_sub_category, r1.naics_code as old_naics_code, r2.naics_code as new_naics_code, r1.latitude as old_latitude, r2.latitude as new_latitude, r1.longitude as old_longitude, r2.longitude as new_longitude, r1.street_address as old_street_address, r2.street_address as new_street_address, r1.city as old_city, r2.city as new_city, r1.region as old_region, r2.region as new_region, r1.postal_code as old_postal_code, r2.postal_code as new_postal_code, r1.iso_country_code as old_iso_country_code, r2.iso_country_code as new_iso_country_code, r1.phone_number as old_phone_number, r2.phone_number as new_phone_number, r1.open_hours as old_open_hours, r2.open_hours as new_open_hours, r1.category_tags as old_category_tags, r2.category_tags as new_category_tags, r1.wkt as old_wkt, r2.wkt as new_wkt, r1.polygon_class as old_polygon_class, r2.polygon_class as new_polygon_class, r1.includes_parking_lot as old_includes_parking_lot, r2.includes_parking_lot as new_includes_parking_lot, r1.is_synthetic as old_is_synthetic, r2.is_synthetic as new_is_synthetic
from r1, r2
where r1.safegraph_place_id = r2.safegraph_place_id
and (r1.parent_safegraph_place_id <> r2.parent_safegraph_place_id 
or r1.location_name <> r2.location_name 
or r1.safegraph_brand_ids <> r2.safegraph_brand_ids 
or r1.brands <> r2.brands
or r1.top_category <> r2.top_category 
or r1.sub_category <> r2.sub_category
or r1.naics_code <> r2.naics_code 
or r1.latitude <> r2.latitude 
or r1.longitude <> r2.longitude 
or r1.street_address <> r2.street_address
or r1.city <> r2.city 
or r1.region <> r2.region
or r1.postal_code <> r2.postal_code 
or r1.iso_country_code <> r2.iso_country_code 
or r1.phone_number <> r2.phone_number
or r1.open_hours <> r2.open_hours 
or r1.category_tags <> r2.category_tags 
or r1.wkt <> r2.wkt 
or r1.polygon_class <> r2.polygon_class 
or r1.includes_parking_lot <> r2.includes_parking_lot 
or r1.is_synthetic <> r2.is_synthetic)
Union
select *
from
Adds
Union
select *
from
Deletes
Union
select *
from No_change

Updated 9 days ago


Data Science Resources


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.