The SafeGraph Developer Hub

Welcome to the SafeGraph developer hub. You'll find comprehensive guides and documentation to help you start working with SafeGraph as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Social Distancing Metrics

Due to the COVID-19 pandemic, people are currently engaging in social distancing. In order to understand what is actually occurring at a census block group level, SafeGraph is offering a temporary Social Distancing Metrics product.

This product is delivered daily (3 days delayed from actual). Daily data is available going back to January 1, 2019. We used v2.1 to create the historical data from Jan 1, 2019 - Dec 31, 2019 (the backfill) as well as the data from May 10, 2020 forwards. However, the Jan 1-May 9, 2020 data is on v2.0. Apologies for any inconvenience, please see Release Notes below for more information.

The data was generated using a panel of GPS pings from anonymous mobile devices. We determine the common nighttime location of each mobile device over a 6 week period to a Geohash-7 granularity (~153m x ~153m). For ease of reference, we call this common nighttime location, the device's "home". We then aggregate the devices by home census block group and provide the metrics set out below for each census block group.

To preserve privacy, we apply differential privacy to all of the device count metrics other than the device_count. This may cause the exact sum of devices to not equal device_count, especially for sparsely populated origin_census_block_group. Differential privacy is applied to all of the following columns: completely_home_device_count, part_time_work_behavior_devices, full_time_work_behavior_devices, delivery_behavior_devices, at_home_by_each_hour, bucketed_away_from_home_time, bucketed_distance_traveled, bucketed_home_dwell_time, bucketed_percentage_time_home.

If as a result of the differential privacy applied:

  • device_count < part_time_work_behavior_devices + full_time_work_behavior_devices +completely_home_device_count or
  • device_count < sum(counts in bucketed_distance_traveled) or
  • device_count < sum(counts in bucketed_home_dwell_count),

we then increase the device_count to the applicable sum (this only occurs in census_block_groups with small device_counts).

Schema

Column Name Description Type Example
origin_census_block_group The unique 12-digit FIPS code for the Census Block Group. Please note that some CBGs have leading zeros. String 131000000000
date_range_start Start time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:mm:SS±hh:mm (local time with offset from GMT). The start time will be 12 a.m. of any day. String 2020-03-01T00:00:00-06:00
date_range_end End time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:mm:SS±hh:mm (local time with offset from GMT). The end time will be the following 12 a.m. String 2020-03-02T00:00:00-06:00
device_count Number of devices seen in our panel during the date range whose home is in this census_block_group. Home is defined as the common nighttime location for the device over a 6 week period where nighttime is 6 pm - 7 am. Note that we do not include any census_block_groups where the count <5. Integer 100
distance_traveled_from_home Median distance (in meters) traveled from the geohash-7 of the home by the devices included in the device_count during the time period (excluding any distances of 0). We first find the median for each device and then find the median across all of the devices. Integer 200
bucketed_distance_traveled Key is range of meters (from geohash-7 of home) and value is device count. If a device made multiple trips, we use the median distance for the device. JSON {String: Integer} {"0": 100, "1-1000": 40, "1001-2000": 45, "2001:8000": 15, "8001-16000": 0, "16001-50000": 0, "<50000": 0}
median_dwell_at_bucketed_ distance_traveled Key is range of meters and value is the median dwell time in minutes of the devices that traveled the given distance from the geohash-7 of the home. JSON {String: Integer} {"<1000": 300, "1001-2000": 60, "2001:8000": 120, "8001-16000": 5, "16001-50000": 5, "<50000": 60}
completely_home_device_count Out of the device_count, the number of devices which did not leave the geohash-7 in which their home is located during the time period. Integer 40
median_home_dwell_time Median dwell time at home geohash-7 ("home") in minutes for all devices in the device_count during the time period. For each device, we summed the observed minutes at home across the day (whether or not these were contiguous) to get the total minutes for each device. Then we calculate the median of all these devices. Beginning in v2, we include the portion of any stop within the time range regardless of whether the stop start time was in the time period. Integer 1200
bucketed_home_dwell_time Key is range of minutes and value is device count of devices that dwelled at geohash-7 of home for the given time period. For each device, we summed the observed minutes at home across the day (whether or not these were contiguous) to get the total minutes for each device this day. Then we count how many devices are in each bucket. Beginning in v2, we include the portion of any stop within the time range regardless of whether the stop start time was in the time period. JSON {String: Integer} {"<60": 0, "61-360": 0, "361-720": 10, "721-1080": 40, ">1081": 50}
at_home_by_each_hour A mapping of hour of day to the number of devices at geohash-7 home in each hour over the course of the day in local time. First element in the array corresponds to the hour of midnight to 1am. JSON [Integer] [ 90, 90, 90, 80, 80, 70, 70, ...]
part_time_work_behavior_devices Out of the device_count, the number of devices that spent one period of between 3 and 6 hours at one location other than their geohash-7 home during the period of 8 am - 6 pm in local time. This does not include any device that spent 6 or more hours at a location other than home. Integer 10
full_time_work_behavior_devices Out of the device_count, the number of devices that spent greater than 6 hours at a location other than their home geohash-7 during the period of 8 am - 6 pm in local time. Integer 10
* destination_cbgs Key is a destination census block group and value is the number of devices with a home in census_block_group that stopped in the given destination census block group for >1 minute during the time period. Destination census block group will also include the origin_census_block_group (so would capture any device that stayed completely at home or were at least seen at some point in the time period within the origin_census_block_group). This means that the difference between the device_count and the count of devices with a destination_cbg that is the same as the origin_census_block_group represents the number of devices that originate from the origin_census_block group but are staying completely outside of it. JSON {String: Integer} {"130890212162":91,"131210101101":22,"131350502123":20}
* delivery_behavior_devices Out of the device_count, the number of devices that stopped for < 20 minutes at > 3 locations outside of their geohash-7 home. Integer 10
* median_non_home_dwell_time Median dwell time at places outside of geohash-7 home in minutes for all devices in the device_count during the time period. For each device, we summed the observed minutes outside of home across the day (whether or not these were contiguous) to get the total minutes for each device. Then we calculate the median of all these devices. Integer 60
* candidate_device_count Number of devices in our panel whose home is in this census_block_group regardless of whether we saw any activity for them in the time range. Home is defined as the common nighttime location for the device over a 6 week period where nighttime is 6 pm - 7 am. Integer 100
* bucketed_away_from_home_time Key is range of minutes and value is device count of devices that dwelled anywhere outside of the geohash-7 of home for the given time period. For each device, we summed the observed minutes away from home across the day (whether or not these were contiguous) to get the total minutes for each device this day. Then we count how many devices are in each bucket. Beginning in v2, we include the portion of any stop within the time range regardless of whether the stop start time was in the time period. JSON {String: Integer} {"0- 20": 5, "21-45": 4, "46-60": 5, "61-120": 4, "121-180": 5, "181-240": 10, "241-300": 4, "301-360": 8, "361-420": 10, "421-480": 8, "481-540": 4, "541-600": 2, "601-660": 3, "661-720": 4, "721-840": 3, "841-960": 5, "961-1080": 2, "1081-1200": 2, "1201-1320": 2, "1321-1440": 1}
* median_percentage_time_home Median percentage of time we observed devices home versus observed at all during the time period. Integer 72
* bucketed_percentage_time_home Key is a range of percentage of time a device was observed at home (numerator) out of total hours observed that day at any location (denominator). Value is the number of devices observed in this range. JSON {String: Integer} {"0-25": 6, "26-50": 5, "51-75": 10, "76-100": 100}
** mean_home_dwell_time Mean dwell time at home geohash-7 ("home") in minutes for all devices in the device_count during the time period. For each device, we summed the observed minutes at home across the day (whether or not these were contiguous) to get the total minutes for each device. Then we calculate the mean of all these devices. Integer 1200
** mean_non_home_dwell_time Mean dwell time at places outside of geohash-7 home in minutes for all devices in the device_count during the time period. For each device, we summed the observed minutes outside of home across the day (whether or not these were contiguous) to get the total minutes for each device. Then we calculate the mean of all these devices. Integer 60
** mean_distance_traveled_from_home Mean distance (in meters) traveled from the geohash-7 of the home by the devices included in the device_count during the time period. We first find all the distances traveled for each device. We filter out any distances that are > 1.5 x the interquartile range for that device. We do NOT filter on the low range since we were only concerned with long distance outliers. We then take the mean for each device using the non-filtered distances. Then we find the mean across all of the devices. Integer 200

* New in v2
** New in v2.1

Release Notes

  • For v1, beginning with the 4/3/2020 delivery, we are excluding this census block group in Iowa: 190570010001. It likely contains a sink (a location on a map which sees a disproportionate number of location pings that usually aren’t actually accurate).
  • In v1, we discovered a bug in full_time_work_behavior_devices. In v1 we included visits that had a start time within the day and did not truncate the ones that went over the day boundary. This made it so that some visits had start_hour > end_hour, which fooled the condition for work hours: isWorkHours = startHour >= 7 && endHour <= 17 (e.g. start_hour = 20 and end_hour = 8 satisfy that condition) . This was fixed in v2.
  • v2 adds new columns - destination_cbgs, delivery_behavior_devices, median_non_home_dwell_time, candidate_device_count, bucketed_away_from_home_time, median_percentage_time_home, bucketed_percentage_time_home
  • v2 adds a 0 bucket to bucketed_distance_traveled
  • Beginning in v2, for calculating dwell times, we include the portion of any stop (i.e., a dwell event) within the time range regardless of whether the stop or start time was contained in the time period. Previously we only included dwell events with a start time inside the time period. An example of the impact of this change, consider a dwell event that starts at 11pm on day N and lasts until 6am on day N+1. In v1 this dwell event would only be counted on day N, but in v2 this dwell event is counted both for day N and day N+1. As a result the device counts overall slightly increased due to these day-boundary events now being included in both days.
  • v2.1 begins with the 5/10/2020 data under the v2 path (delivered on 5/13/2020):
    -- Fix implemented so that differential privacy applied to bucketed_percentage_time_home would not cause the sum of these counts to exceed the device_count.
    -- Fix implemented so that distance_traveled_from_home is never blank. If all devices remained home, 0 is entered.
    -- New column mean_home_dwell_time added.
    -- New column mean_non_home_dwell_time added.
    -- New column mean_distance_traveled_from_home added.
  • We used v2.1 to create the historical data from Jan 1, 2019 - Dec 31, 2019 (the backfill) as well as the data from May 10, 2020 forwards. However, the Jan 1-May 9, 2020 data is on v2.0. Apologies for any inconvenience. There are some methodological differences between v2.0 and v2.1 that likely explain some significant differences seen between these two time periods. For example, You will see more people staying at home in Jan 2019 than in Jan 2020 but we believe this is due to a methodological difference between v2.0 and v2.1 rather than a real world difference in behavior. Ideally we would provide a backfill of all historical data on the same version, however this is not available at this time and we are not sure when (if ever) it will be available.

Updated about a month ago


Social Distancing Metrics


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.