SafeGraph General

Places

Geometry

Spend


Which records were added, deleted, or changed since last release?

  • See the "Calculating Diffs" section of Data Science Resources for plug and play code to answer all of your burning diff questions.

How do I work with SafeGraph data in Spark?

val df = spark.read.option("header", "True")
  .option("escape", "\"")
  .csv("/PUT YOUR PATH HERE")
  • If using python/pyspark, read in the data as follows:
df = spark.read.option("header", True).option("escape", "\"").csv("/PUT YOUR PATH HERE")

How often is SafeGraph Places updated?

  • SafeGraph issues updates to Places once per month, which is much more frequently than other POI vendors, who may update once every 3-6 months.
  • We can do this because we work with more sources of data and are much more efficient at combining those sources of data. During each month, some subset of our sources will send us their updates, and we ensure that we onboard and integrate those changes quickly and easily.
  • This enables us to quickly reflect store openings and closings in our Places database.
    • The time between a store opening / closing and being reflected in our Places database is approximately equal to the time that that store update is seen by one of our sources + the time it takes SafeGraph to reflect this in our data.
    • The latter of these two is typically within the month -- which is very fast compared to competitors, which might be within 3 months.
    • However, the former of these two is hard to predict -- but we do work with sources that generally receive updates very quickly.

How should I match SafeGraph Places with existing internal POI data?

  • Matching place data is very difficult. Some places will match immediately (i.e. store name, address, zipcode, etc. are exactly the same), but the majority of places will not match. Is "peets coffee" at "345 5th street" in our database the same as "Peet’s Coffee & Tea" at "357 fifth st." in another database? Basic exact matching will not match these two, so your team will need to have built out advanced deduplication logic or else you will notice significant discrepancies.
  • SafeGraph offers a Matching Service that we recommend utilizing for this purpose. Please contact us if you're interested!

How do I use SafeGraph Places in ESRI?

First, friendly reminder that Patterns does not have any geospatial data on its own. If you want to do geospatial analysis, you should augment these datasets with Geometry, which contains a latitude and longitude coordinate for every POI.

Visualizing POI as point data in Esri

Let's say your goal is to visualize a point for each POI on a map and have the Patterns data available in the pop-up in ArcGIS Online (AGOL).

  • First, load the SafeGraph csv file into AGOL. Make sure your data includes lat/long (any data cut that includes PLACES or GEOMETRY will). Instead of "Locate by Address or Places" select "Coordinates" and make sure latitude and longitude are mapped correctly (it should auto-detect this). * This should load successfully.
  • Open the SafeGraph data in a map in AGOL.

Visualizing POI as polygons in Esri

There are a few methods to take the data in polygon_wkt in SafeGraph Geometry and visualize the data. Unfortunately, ArcGIS Online cannot natively read the polygon_wkt, so you will have to convert it.

This Google colab notebook illustrates a best practice for converting SafeGraph Geometry files to Esri SHP files using geopandas. :tada:

Alternatively, if you are working with arcpy, you can convert a WKT to a ArcGIS Polygon Geometry using the fromwkt() function. If none of these are meeting your workflow needs, we recommend contacting Esri support to develop a workflow solution. See Also: Visualizing WKT.

What coordinate reference system does SafeGraph use for its centroid and polygon coordinates?

WGS84, also referred to as EPSG:4326.

How does SafeGraph assign NAICS code to points-of-interest?

  • We strive to assign each point-of-interest the most reasonable, sensical and appropriate NAICS code. We have a multi-prong approach. We use human-experts to label NAICS to brands. We use the business name as an indication of its category. We have also crawled extra open-source information about a point-of-interest to infer the most correct NAICS code. We use a deep neural network model to match long tail POI to NAICS based on name and other data points we have crawled.
  • Note that most data that SafeGraph curates and reports have objective truth, like zip_code or visits_by_day. In contrast, there is no objective truth for NAICS code. NAICS are detailed descriptive categories created by governments, but they do not perfectly describe every business. There are many examples of a point-of-interest that reasonably fits into multiple NAICS or does not fit into any NAICS very well. In these cases we strive for the "most correct" answer.
  • If you see a NAICS code that doesn't make sense to you, let us know!

How can I visualize the polygon_wkt from SafeGraph Geometry?

If you are proficient with Esri tools, then you have some options in Esri.
If you are not familiar with any GIS tools and just looking for some quick and easy visualizations, we recommend Kepler.gl. You can upload a SafeGraph CSV directly into Kepler and see points and polygons within seconds.

BigQuery does not like my polygons?

  • We have found that running the ST_GEOGFROMTEXT function in Google's BigQuery on our full dataset will return an error-- ST_GeogFromText failed: Invalid polygon loop. This is caused by only a handful of our polygons (under 20) not playing well with BigQuery. We have not encountered this issue with other geo libraries.
  • So that this does not stop you generally from calling this function on the polygons, use SAFE.ST_GEOGFROMTEXT(wkt). This will result in your function running and the few problematic polygons will just return NULL.
  • We are looking into a solution so that this error does not occur at all.

What version of census block groups does SafeGraph use for the Patterns products?

  • SafeGraph uses the 2010-2019 version of the census block groups for the U.S., specifically the 2016 vintage.
  • You can find more information and a link to download the U.S. census block group geometries on our Open Census Data page!

How does SafeGraph apply the census block group idea for Canada?

  • For Canadian entries in any cbg column (e.g., poi_cbg or visitor_home_cbgs), we use the Canadian Dissemination area designations (Canadian units have CA: as a prefix)

How do I aggregate census block groups to zip codes?

  • Check out some of the awesome data science resources we have on our Github page. If you search "zip" on that page, you'll find examples in Python and in R.

Is the Spend Transactions Panel the same as the Patterns Mobile Device Panel?

  • No, they come from completely different sources and therefore are unrelated.
  • The transactions data do not come from mobile devices; this is what allows us to have robust Spend data for indoor locations which are challenging for mobile GPS signals.

Which kinds of cards are included in the Spend dataset? Is it only one type of card?

  • We're not able to say exactly which credit card brands are included, but the panel includes both debit cards (i.e., bank cards) and credit cards.
  • The panel is also not all from one particular brand, e.g., not all Mastercard or Visa.
  • Usually this question is asked with the purpose of understanding the representativeness of our panel. If this is a concern, please see our material on Quantifying Geographic Bias comparing our panel to the census (Average bias < 1% with a maximum of +/-4% per state).

How do I unzip .csv.gz files?

  • .gz files are not regularly zip files, but are gzipped.
  • If you use a mac, you can use the gunzip utility in terminal to unzip files. See more here.