Social Distancing Metrics
Update
This data was live during the peak of the COVID-19 pandemic and is no longer supported. We've created this Notebook for those who are interested in using Weekly Patterns as a proxy for Social Distancing Metrics.
This product is delivered daily (3 days delayed from actual). Daily data is available going back to January 1, 2019. We used v2.1 to create the historical data from Jan 1, 2019 - Dec 31, 2019 (the backfill) as well as the data from May 10, 2020 forwards. However, the Jan 1-May 9, 2020 data is on v2.0. Apologies for any inconvenience, please see Release Notes below for more information.
The data was generated using a panel of GPS pings from anonymous mobile devices. We determine the common nighttime location of each mobile device over a 6 week period to a Geohash-7 granularity (~153m x ~153m). For ease of reference, we call this common nighttime location, the device's "home". We then aggregate the devices by home census block group and provide the metrics set out below for each census block group.
To preserve privacy, we apply differential privacy to all of the device count metrics other than the device_count
. This may cause the exact sum of devices to not equal device_count
, especially for sparsely populated origin_census_block_group
. Differential privacy is applied to all of the following columns: completely_home_device_count
, part_time_work_behavior_devices
, full_time_work_behavior_devices
, delivery_behavior_devices
, at_home_by_each_hour
, bucketed_away_from_home_time
, bucketed_distance_traveled
, bucketed_home_dwell_time
, bucketed_percentage_time_home
. Note that differential privacy here means adding the same Laplacian Noise as in our Patterns Product but without the rounding up to 4 that occurs there.
If as a result of the differential privacy applied:
device_count
<part_time_work_behavior_devices
+full_time_work_behavior_devices
+completely_home_device_count
ordevice_count
< sum(counts inbucketed_distance_traveled
) ordevice_count
< sum(counts inbucketed_home_dwell_count
),
we then increase the device_count
to the applicable sum (this only occurs in census_block_groups with small device_counts
).
Schema
Column Name | Description | Type | Example |
---|---|---|---|
origin_census_block_group | The unique 12-digit FIPS code for the Census Block Group. Please note that some CBGs have leading zeros. | String | 131000000000 |
date_range_start | Start time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:MM:SS±hh:mm (local time with offset from GMT). The start time will be 12 a.m. of any day. | String | 2020-03-01T00:00:00-06:00 |
date_range_end | End time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:MM:SS±hh:mm (local time with offset from GMT). The end time will be the following 12 a.m. | String | 2020-03-02T00:00:00-06:00 |
device_count | Number of devices seen in our panel during the date range whose home is in this census_block_group. Home is defined as the common nighttime location for the device over a 6 week period where nighttime is 6 pm - 7 am. Note that we do not include any census_block_groups where the count <5. | Integer | 100 |
distance_traveled_from_home | Median distance (in meters) traveled from the geohash-7 of the home by the devices included in the device_count during the time period (excluding any distances of 0). We first find the median for each device and then find the median across all of the devices. | Integer | 200 |
bucketed_distance_traveled | Key is range of meters (from geohash-7 of home) and value is device count. If a device made multiple trips, we use the median distance for the device. | JSON {String: Integer} | {"0": 100, "1-1000": 40, "1001-2000": 45, "2001:8000": 15, "8001-16000": 0, "16001-50000": 0, "<50000": 0} |
median_dwell_at_bucketed_ distance_traveled | Key is range of meters and value is the median dwell time in minutes of the devices that traveled the given distance from the geohash-7 of the home. | JSON {String: Integer} | {"<1000": 300, "1001-2000": 60, "2001:8000": 120, "8001-16000": 5, "16001-50000": 5, "<50000": 60} |
completely_home_device_count | Out of the device_count, the number of devices which did not leave the geohash-7 in which their home is located during the time period. | Integer | 40 |
median_home_dwell_time | Median dwell time at home geohash-7 ("home") in minutes for all devices in the device_count during the time period. For each device, we summed the observed minutes at home across the day (whether or not these were contiguous) to get the total minutes for each device. Then we calculate the median of all these devices. Beginning in v2, we include the portion of any stop within the time range regardless of whether the stop start time was in the time period. | Integer | 1200 |
bucketed_home_dwell_time | Key is range of minutes and value is device count of devices that dwelled at geohash-7 of home for the given time period. For each device, we summed the observed minutes at home across the day (whether or not these were contiguous) to get the total minutes for each device this day. Then we count how many devices are in each bucket. Beginning in v2, we include the portion of any stop within the time range regardless of whether the stop start time was in the time period. | JSON {String: Integer} | {"<60": 0, "61-360": 0, "361-720": 10, "721-1080": 40, ">1081": 50} |
at_home_by_each_hour | A mapping of hour of day to the number of devices at geohash-7 home in each hour over the course of the day in local time. First element in the array corresponds to the hour of midnight to 1am. | JSON [Integer] | [ 90, 90, 90, 80, 80, 70, 70, ...] |
part_time_work_behavior_devices | Out of the device_count, the number of devices that spent one period of between 3 and 6 hours at one location other than their geohash-7 home during the period of 8 am - 6 pm in local time. This does not include any device that spent 6 or more hours at a location other than home. | Integer | 10 |
full_time_work_behavior_devices | Out of the device_count, the number of devices that spent greater than 6 hours at a location other than their home geohash-7 during the period of 8 am - 6 pm in local time. | Integer | 10 |
* destination_cbgs | Key is a destination census block group and value is the number of devices with a home in census_block_group that stopped in the given destination census block group for >1 minute during the time period. Destination census block group will also include the origin_census_block_group (so would capture any device that stayed completely at home or were at least seen at some point in the time period within the origin_census_block_group). This means that the difference between the device_count and the count of devices with a destination_cbg that is the same as the origin_census_block_group represents the number of devices that originate from the origin_census_block group but are staying completely outside of it. | JSON {String: Integer} | {"130890212162":91,"131210101101":22,"131350502123":20} |
* delivery_behavior_devices | Out of the device_count, the number of devices that stopped for < 20 minutes at > 3 locations outside of their geohash-7 home. | Integer | 10 |
* median_non_home_dwell_time | Median dwell time at places outside of geohash-7 home in minutes for all devices in the device_count during the time period. For each device, we summed the observed minutes outside of home across the day (whether or not these were contiguous) to get the total minutes for each device. Then we calculate the median of all these devices. | Integer | 60 |
* candidate_device_count | Number of devices in our panel whose home is in this census_block_group regardless of whether we saw any activity for them in the time range. Home is defined as the common nighttime location for the device over a 6 week period where nighttime is 6 pm - 7 am. | Integer | 100 |
* bucketed_away_from_home_time | Key is range of minutes and value is device count of devices that dwelled anywhere outside of the geohash-7 of home for the given time period. For each device, we summed the observed minutes away from home across the day (whether or not these were contiguous) to get the total minutes for each device this day. Then we count how many devices are in each bucket. Beginning in v2, we include the portion of any stop within the time range regardless of whether the stop start time was in the time period. | JSON {String: Integer} | {"0- 20": 5, "21-45": 4, "46-60": 5, "61-120": 4, "121-180": 5, "181-240": 10, "241-300": 4, "301-360": 8, "361-420": 10, "421-480": 8, "481-540": 4, "541-600": 2, "601-660": 3, "661-720": 4, "721-840": 3, "841-960": 5, "961-1080": 2, "1081-1200": 2, "1201-1320": 2, "1321-1440": 1} |
* median_percentage_time_home | Median percentage of time we observed devices home versus observed at all during the time period. | Integer | 72 |
* bucketed_percentage_time_home | Key is a range of percentage of time a device was observed at home (numerator) out of total hours observed that day at any location (denominator). Value is the number of devices observed in this range. | JSON {String: Integer} | {"0-25": 6, "26-50": 5, "51-75": 10, "76-100": 100} |
** mean_home_dwell_time | Mean dwell time at home geohash-7 ("home") in minutes for all devices in the device_count during the time period. For each device, we summed the observed minutes at home across the day (whether or not these were contiguous) to get the total minutes for each device. Then we calculate the mean of all these devices. | Integer | 1200 |
** mean_non_home_dwell_time | Mean dwell time at places outside of geohash-7 home in minutes for all devices in the device_count during the time period. For each device, we summed the observed minutes outside of home across the day (whether or not these were contiguous) to get the total minutes for each device. Then we calculate the mean of all these devices. | Integer | 60 |
** mean_distance_traveled_from_home | Mean distance (in meters) traveled from the geohash-7 of the home by the devices included in the device_count during the time period. We first find all the distances traveled for each device. We filter out any distances that are > 1.5 x the interquartile range for that device. We do NOT filter on the low range since we were only concerned with long distance outliers. We then take the mean for each device using the non-filtered distances. Then we find the mean across all of the devices. | Integer | 200 |
* New in v2
** New in v2.1
Release Notes
- For v1, beginning with the 4/3/2020 delivery, we are excluding this census block group in Iowa:
190570010001
. It likely contains a sink (a location on a map which sees a disproportionate number of location pings that usually aren’t actually accurate). - In v1, we discovered a bug in
full_time_work_behavior_devices
. In v1 we included visits that had a start time within the day and did not truncate the ones that went over the day boundary. This made it so that some visits had start_hour > end_hour, which fooled the condition for work hours: isWorkHours = startHour >= 7 && endHour <= 17 (e.g. start_hour = 20 and end_hour = 8 satisfy that condition) . This was fixed in v2. - v2 adds new columns - destination_cbgs, delivery_behavior_devices, median_non_home_dwell_time, candidate_device_count, bucketed_away_from_home_time, median_percentage_time_home, bucketed_percentage_time_home
- v2 adds a 0 bucket to bucketed_distance_traveled
- Beginning in v2, for calculating dwell times, we include the portion of any stop (i.e., a dwell event) within the time range regardless of whether the stop or start time was contained in the time period. Previously we only included dwell events with a start time inside the time period. An example of the impact of this change, consider a dwell event that starts at 11pm on day N and lasts until 6am on day N+1. In v1 this dwell event would only be counted on day N, but in v2 this dwell event is counted both for day N and day N+1. As a result the device counts overall slightly increased due to these day-boundary events now being included in both days.
- v2.1 begins with the 5/10/2020 data under the v2 path (delivered on 5/13/2020):
-- Fix implemented so that differential privacy applied tobucketed_percentage_time_home
would not cause the sum of these counts to exceed thedevice_count
.
-- Fix implemented so thatdistance_traveled_from_home
is never blank. If all devices remained home, 0 is entered.
-- New columnmean_home_dwell_time
added.
-- New columnmean_non_home_dwell_time
added.
-- New columnmean_distance_traveled_from_home
added. - We used v2.1 to create the historical data from Jan 1, 2019 - Dec 31, 2019 (the backfill) as well as the data from May 10, 2020 forwards. However, the Jan 1-May 9, 2020 data is on v2.0. Apologies for any inconvenience. There are some methodological differences between v2.0 and v2.1 that likely explain some significant differences seen between these two time periods. For example, you will see more people staying at home in Jan 2019 than in Jan 2020, but we believe this is due to a methodological difference between v2.0 and v2.1 rather than a real world difference in behavior. Ideally we would provide a backfill of all historical data on the same version, but this is not available at this time and we are not sure when (if ever) it will be available.
- Due to an ingestion issue on our side
v2/2021/01/27/2021-01-27-social-distancing.csv.gz
was incomplete. We regeneratedv2/2021/01/27/2021-01-27-social-distancing-rewritten.csv.gz
which should be used instead. - We had a processing issue on 3/8/2021 which resulted in an influx in devices. This explains the sharp increase in devices seen and completely home devices on this date.
Updated about 3 years ago