The SafeGraph Developer Hub

Welcome to the SafeGraph developer hub. You'll find comprehensive guides and documentation to help you start working with SafeGraph as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

How often is SafeGraph Places updated?

  • SafeGraph issues updates to Places once per month which is much more frequently than other POI vendors, who may update once every 3-6 months.
  • We can do this because we work with more sources of data, and are much more efficient at combining those sources of data. During each month, some subset of our sources will send us their updates, and we ensure that we onboard and integrate those changes quickly and easily.
  • This enables us to quickly reflect store openings and closings in our Places database.
    • The time between a store opening / closing and being reflected in our Places database is approximately equal to the time that that store update is seen by one of our sources + the time it takes SafeGraph to reflect this in our data.
    • The latter of these two is typically within the month -- which is very fast compared to competitors, which might be within 3 months.
    • However, the former of these two is hard to predict -- but we do work with sources that generally receive updates very quickly.

Which records were added, deleted, or changed since last release?

  • See the "Calculating Diffs" section of Data Science Resources for plug and play code to answer all of your burning diff questions.

How should I match SafeGraph Places with existing internal POI data?

  • Matching place data is very difficult. Some places will match immediately (i.e. store name, address, zipcode etc. are exactly the same), but the majority of places will not match. Is "peets coffee" at "345 5th street" in our database the same as "Peet’s Coffee & Tea" at "357 fifth st." in another database? Basic exact matching will not match these two, so your team will need to have built out advanced deduplication logic or else you will notice significant discrepancies.
  • SafeGraph offers a matching service that we recommend utilizing for this purpose. Please contact us if you're interested!

How do I work with the Patterns columns that contain JSON?

  • We have a simple web app for exploding the JSON here. You can explode it horizontally (into more columns) or vertically (into more rows). Just upload your file and pick which columns you want exploded. This is a quick and easy solution if you have a file with 1k or fewer rows (about 1MB) and do not want to explode beyond 20k rows.
  • If you ❤️Excel, we have an add-in that you can install to parse the JSON columns. The add-in can be downloaded here. See video demo of installation and usage. Written instructions are here. ⚠️This is only recommended for small samples of the data (100 rows or so)!
  • Want more control?
  • To horizontally explode the JSON into more columns programmatically, see an example using pandas here
  • To vertically explode the JSON into more rows programmatically, here are some code examples using PySpark or Scala Spark(click tabs):
This code takes SG Patterns data as a PySpark DataFrame 
and vertically explodes 
the `visitor_home_cbgs` column into many rows. 
The resulting dataset has 3 columns: 
safegraph_place_id, visitor_count, visitor_home_cbg.

from pyspark.sql.functions import udf, explode
from pyspark.sql.types import *
import json

def parser(element):
  return json.loads(element, MapType(StringType(), IntegerType()))

jsonudf = udf(parser, MapType(StringType(), IntegerType()))

visitor_home_cbgs_parsed = df.withColumn("parsed_visitor_home_cbgs", jsonudf("visitor_home_cbgs"))
visitor_home_cbgs_exploded ="safegraph_place_id", explode("parsed_visitor_home_cbgs"))

display(visitor_home_cbgs_exploded.selectExpr("safegraph_place_id as safegraph_place_id", "key as visitor_home_cbg","value as visitor_count"))
import org.apache.spark.sql.functions._
import play.api.libs.json._

def parser(element: String) = {
  Json.parse(element).as[Map[String, Int]]

val jsonudf = udf(parser _)
val converted = df.withColumn("parsed_related_same_day_brand", jsonudf($"related_same_day_brand"))
display($"safegraph_place_id", explode($"parsed_related_same_day_brand" as "exploded_related_same_day_brand")))
val visitor_home_cbgs_parsed = df.withColumn("parsed_visitor_home_cbgs", jsonudf($"visitor_home_cbgs"))
display($"safegraph_place_id", explode($"parsed_visitor_home_cbgs" as "exploded_visitor_home_cbgs")))
If you are working with large datasets (i.e. > 20,000 POI at a time), 
then you should consider the Python-Pyspark solution; 
it is much much more efficient). 

This code takes SG Patterns data as a pandas DataFrame 
and vertically explodes 
the `visitor_home_cbgs` column into many rows. 
The resulting dataset has 3 columns: 
safegraph_place_id, visitor_count, visitor_home_cbg.

import pandas as pd
import json

patterns_df = pd.read_csv("safegraph_patterns_data.csv")

# convert jsons to dicts
patterns_df = patterns_df.dropna(subset = ['visitor_home_cbgs'])
patterns_df['visitor_home_cbgs_dict'] = [json.loads(cbg_json) for cbg_json in patterns_df.visitor_home_cbgs]

# extract each key:value inside each visitor_home_cbg dict (2 nested loops) 
all_sgpid_cbg_data = [] # each cbg data point will be one element in this list
for index, row in patterns_df.iterrows():
  this_sgpid_cbg_data = [ {'safegraph_place_id' : row['safegraph_place_id'], 'visitor_home_cbgs' : key, 'visitor_count' : value} for key,value in row['visitor_home_cbgs_dict'].items() ]
  # concat the lists
  all_sgpid_cbg_data = all_sgpid_cbg_data + this_sgpid_cbg_data

home_cbg_data_df = pd.DataFrame(all_sgpid_cbg_data)

# note: home_cbg_data_df has 3 columns: safegraph_place_id, visitor_count, visitor_home_cbg

# sort the result:
home_cbg_data_df = home_cbg_data_df.sort_values(by=['safegraph_place_id', 'visitor_count'], ascending = False)
# This code takes SG patterns data as a 
# data.frame (or, even better, a data.table)
# and vertically explodes the `visitor_home_cbgs`
# column into many rows, or the `visits_by_day` column.
# This results in one row for safegraph_place_id
# one for visitor_count
# and one for visitor_home_cbg/day

# if you don't have the SafeGraphR package:
# install.packages('remotes')
# remotes::install_github('SafeGraphInc/SafeGraphR')


# Generally, data.table::fread is preferred to read.csv
# but this is fine for small files
patterns_df <- read.csv('safegraph_patterns_data.csv')

# expand_cat_json expands categorical JSON variables like visitor_home_cbg

home_cbg_data_df <- expand_cat_json(patterns_df,
                                    expand = 'visitor_home_cbgs',
                                    index = 'origin_cbg',
                                    by = 'safegraph_place_id')
# Fix variable names
names(home_cbg_data_df)[names(home_cbg_data_df) == 'visitor_home_cbgs'] <- 'visitor_count'
names(home_cbg_data_df)[names(home_cbg_data_df) == 'origin_cbg'] <- 'visitor_home_cbgs'

# expand_int_json expands integer JSON variables like visits_by_day

day_data_df <- expand_int_json(patterns_df,
                               expand = 'visits_by_day',
                               index = 'day',
                               by = 'safegraph_place_id')
# Fix variable names
names(day_data_df)[names(home_cbg_data_df) == 'visits_by_day'] <- 'visitor_count'

How do I work with SafeGraph data in Spark?

val df ="header", "True")
  .option("escape", "\"")

How do I use SafeGraph Places in ESRI?

First, friendly reminder that Patterns does not have any geospatial data on its own. If you want to do geospatial analysis you should augment these datasets with Core Places which contains a latitude and longitude coordinate for every POI.

Visualizing POI as point data in Esri

Let's say your goal is to visualize a point for each POI on a map and have the Patterns data available in the pop-up in ArcGIS Online (AGOL).

You have a few options for how to bring the data into AGOL.

1st, you can have ESRI geocode the POI for you by address. When you upload the Patterns data (as an unzipped csv) on the upload screen select "Locate by Address or Places" and select the appropriate columns. location_name > "place or address". and street_address > "place or address". city > city. state > state. At large scale (many rows) ESRI will charge you for this, but for small numbers of POI it should be trivial. The resulting feature service will show the Patterns data as points on a map as you would expect.

2nd, alternatively, you can use SafeGraph geospatial data. This is probably more accurate than having ESRI geocode for you, but it may not be worth the effort depending on your needs.

  • First load the CORE csv into AGOL. Instead of "Locate by Address or Places" select "Coordinates" and make sure latitude and longitude are mapped correctly (it should auto-detect this). * This should load successfully.
  • Then you load the Patterns data as a table (Locate Feature By > "None, add as table".
  • Open the CORE data in a map in AGOL and ADD the Patterns Table to the map.
  • Join on the Patterns table to the CORE layer using Analysis > Summarize > Join Features. The shared key is the safegraph_place_id, so you would join on this key one-to-one.
  • The resulting layer has both the geospatial data and the patterns data.

3rd, alternatively, you could join CORE csv and PATTERNS csv yourself in excel BEFORE loading into AGOL, joining on safegraph_place_id. e.g. via VLOOKUP(). Then just load this file as described in Step 1 for the CORE file above.

4th, FYI SafeGraph has made available all of our core places data (plus latitude and longitude centroids) for free via the ESRI Marketplace. This workflow would be similar to the 2nd workflow described above, except instead of loading the CORE file you would just use the feature layer from the marketplace listing. Then you could join the patterns data table onto that feature layer joining on safegraph_place_id one-to-one.

Visualizing POI as polygons in Esri

There are a few methods to take the data in polygon_wkt in SafeGraph Geometry and visualize the data. Unfortunately ArcGIS Online cannot natively read the polygon_wkt, so you will have to convert it.
If you convert your CSV into a shapefile, then ArcGIS Online will correctly read the shapefile. If your Geometry data is small, you can select "download as shape file" during Checkout on the SafeGraph Data Bar. Some customers have had success using 3rd party conversion tools like MyGeoCloud to convert CSVs into Shapefiles. Other customers have used QGIS (open source GIS software) to convert across data formats. Once your WKT polygons are loaded into a QGIS 3.10 project, right click the data layer and select "Export" ➡️"Save Feature As..." and then choose "ESRI Shapefile" from the format dropdown. For visual learners, see this video example.
Alternatively, if you are working with arcpy, you can convert a WKT to a ArcGIS Polygon Geometry using the fromwkt() function. If none of these are meeting your workflow needs, we recommend contacting Esri support to develop a workflow solution. See Also: Visualizing WKT

How can I visualize the polygon_wkt from SafeGraph Geometry?

If you are proficient with Esri tools, then you have some options in Esri. .
If you are not familiar with any GIS tools and just looking for some quick and easy visualizations, we recommend You can upload a SafeGraph CSV directly into Kepler and see points and polygons within seconds.

How does SafeGraph assign NAICS code to points-of-interest?

  • We strive to assign each point-of-interest the most reasonable, sensical and appropriate NAICS code. We have a multi-prong approach. We have used human-experts to label NAICS to brands. We use the business name as an indication of its category. We have also crawled extra open-source information about a point-of-interest to infer the most correct NAICS code. We use a deep neural network model to match long tail POI to NAICS based on name and other data points we have crawled.
  • Note that most data that SafeGraph curates and reports have objective truth, like zip_code or visits_by_day. In contrast, there is no objective truth for NAICS code. NAICS are detailed descriptive categories created by governments but they do not perfectly describe every business. There are many examples of a point-of-interest that reasonably fits in to multiple NAICS or does not fit in to any NAICS very well. In these cases we strive for the "most correct" answer.
  • If you see a NAICS code that doesn't make sense to you -- let us know!

BigQuery does not like my polygons?

  • We have found that running the ST_GEOGFROMTEXT function in Google's BigQuery on our full dataset will return an error-- ST_GeogFromText failed: Invalid polygon loop. This is caused by only a handful of our polygons (under 20) not playing well with BigQuery. We have not encountered this issue with other geo libraries.
  • So that this does not stop you generally from calling this function on the polygons, use SAFE.ST_GEOGFROMTEXT(wkt). This will result in your function running and the few problematic polygons will just return NULL.
  • We are looking into a solution so that this error does not occur at all.

What are you using for MSAs in the Data Bar?

  • You might have noticed that you can order data by Metropolitan Statistical Area in the Data Bar.
  • The MSAs are defined here.

Where does the device data used in Patterns come from?

We partner with mobile applications that obtain opt-in consent from its users to collect anonymous location data. This data is not associated with any name or email address. This data includes the latitude and longitude of a device at a given point in time. We take this latitude/longitude information and determine visits to points of interest. We then aggregate these anonymous visits to create our Patterns product.

Do you have historical Patterns data?

  • Yes! We have Patterns data going back to January 1st, 2018. The previous 3 months are available in the Data Bar. Beyond that, please contact us.
  • In order to successfully compare the data over time, we encourage normalizing based on our panel size over time. Each monthly delivery of Patterns includes the Panel Overview Data to enable this normalization. Please see our Data Science Resources for guidance on how to go about doing this.
  • Please note that the underlying Places data used to create Patterns changes over time due to the history of how we built and updated the product. Below is a chronological breakdown of the Places release used to backfill Patterns for a given time period:
    • Historical Patterns activity from October 2016 through and including December 2016 was generated using the April 2019 release of Places. We no longer externally provide this data.
    • Patterns provided/delivered between November 2019 and April 2020:
      -- Activity from January 2017 through and including October 2019 was generated using the November 2019 release of Places.
      --Activity from November 2019 through and including April 2020 was based on the Places release of the same month as the activity (so December 2019 activity will use the December 2019 Places release).
    • Patterns provided/delivered beginning May 2020:
      --Activity from January 2018 through and including May 2020 was generated using the May 2020 release of Places. This means that the POI counts during this time period will be the same since we projected the same Places release back in time. The upside of this is it gives a stable view over time for comparisons. The downside is it overlooks the change in businesses that might have occurred. We are looking to improve that in our next backfill.
      --Activity from May 2020 going forward is based on the Places release of the same month as the activity (so June 2020 activity will use the June 2020 Places release).

Updated a day ago


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.