Spend
SafeGraph’s Spend dataset contains anonymized debit and credit card transaction data aggregated to individual places in the U.S.
Gain insights on how spend behavior is changing over time at specific points of interest (POI), including the average transaction size, spend in-person vs. online, spend by customer demographics such as income, and spend by customer loyalty.
This data is ideal for:
- Competitor Analysis (e.g., In which markets are store sales growth outperforming competitors?)
- Site Selection (e.g., What co-tenants drive sales volume for individual stores within a brand?)
- Impact Measurement (e.g., Where did sales go up the most for our new product launch?)
and any other use-cases where dynamic spend insights at individual locations are paramount.
Spend is aggregated at a monthly time interval, and delivered on the 20th after each month's end. Historic Spend data is available back to January 2019.
Check out our Spend Summary Statistics page for detailed information on our coverage.
Contents:
Spend Schema
spend.csv [reference file name for enterprise deliveries]
Spend contains many columns which are of JSON type (e.g., bucketed_customer_frequency
). Please see our FAQs page here for guidance on how to work with these columns.
Note that Spend is bundled with Places which provides address information, geo-coordinates, industry categorizations and more.
Column Name | Description | Type | Example |
---|---|---|---|
placekey | Unique and persistent ID tied to this POI. See the Placekey Concept for details on placekey design. | String | 222-222@222-222-222 |
safegraph_brand_ids | Unique and consistent ID(s) that represents this specific brand. | List | SG_BRAND_59dcabd7cd2395a2 |
brands | If this POI is an instance of a larger brand that we have explicitly identified, this column will contain that brand name. See: brands. | List | Target |
spend_date_range_start | Start time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:MM:SS±hh:mm (local time with offset from GMT). The start time will be 12 a.m. Monday in local time. See Date Granularity. | String | 2020-03-01T00:00:00-06:00 |
spend_date_range_end | End time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:MM:SS±hh:mm (local time with offset from GMT). The end time will be the following Monday at 12 a.m. local time. See Date Granularity. | String | 2020-04-01T00:00:00-06:00 |
raw_total_spend | Total amount spent at this POI in transactions captured by our panel during the date range. | Float | 76050.12 |
raw_num_transactions | Number of transactions at this POI captured by our panel during the date range. | Integer | 1521 |
raw_num_customers | Number of unique customers with at least one transaction at this POI captured by our panel during the date range. POI rows with fewer than 4 customers in the time period are excluded. | Integer | 435 |
median_spend_per_transaction | Median amount spent in each transaction at this POI. | Float | 42.00 |
median_spend_per_customer | Median amount spent by each customer at this POI. This value takes into account customers that have made multiple transactions at this POI. | Float | 125.83 |
spend_per_ transaction_percentiles | The 25th and 75th percentiles of spend_per_transaction at this POI. | JSON {String: Float} | {“25”: 23.11, “75”: 80.99} |
spend_by_day | Total amount spent at this POI each day over the covered time period. See Date Granularity. | JSON [Float] | [2535.34, 5214.11, … ] |
spend_per_transaction_by_day | Median transaction size at this POI each day over the covered time period. Values will be null for days with no transactions. See Date Granularity. | JSON [Float] | [20.33, 70.22, … ] |
spend_by_day_of_week | Total amount spent at this POI on each day of the week over the covered time period. See Date Granularity. | JSON {String: Float} | {“Monday”: 10864.11, “Tuesday”: 15200.10, … } |
day_counts | The number of times each day of the week (e.g., Monday, Tuesday, etc.) occurred in the measurement period. See Date Granularity for why we include this. | JSON {String: Integer} | {"Monday": 4, "Tuesday": 5, "Wednesday": 5, "Thursday": 5, "Friday": 4, "Saturday": 4, "Sunday": 4 } |
spend_pct_change_ vs_prev_month | Percent difference between last month’s raw_total_spend and this month’s. Value will be null where reference month does not exist. | Integer | 5 |
spend_pct_change_ vs_prev_year | Percent difference between last year’s same-month raw_total_spend and this month’s. Value will be null where reference month does not exist. | Integer | -10 |
online_transactions | The number of online transactions at this POI during the date range. The remaining transactions were in-person. See Online vs In-person Transactions. | Integer | 310 |
online_spend | The amount spent at this POI through online methods during the date range. The remaining spend was in-person. See Online vs In-person Transactions. | Float | 7512.22 |
transaction_intermediary | The number of transactions at this POI based on the intermediary through which the transaction was made, if any. See Transaction Intermediaries | JSON {String: Integer} | {“No Intermediary”: 900, "Apple Pay": 215, "DoorDash": 155, "Square": 32} |
spend_by_ transaction_intermediary | Total amount spent among transactions by intermediary, including no intermediary. For each POI, will have the same keys as transaction_intermediary . | JSON {String: Float} | {“No Intermediary”: 10400.12, "Apple Pay": 2015.00, "DoorDash": 1502.33, "Square": 320.00} |
bucketed_customer_frequency | The distribution of customer repeat frequencies based on pre-specified buckets. Key is the number of transactions per customer and value is the number of customers that were within that range. | JSON {String: Integer} | { "1": 500, "2": 302, "3": 101, "4": 20, "5-10": 90, ">10": 5} |
mean_spend_per_customer_ by_frequency | Mean amount spent per customer at this POI based on customer frequency. Key is the number of transactions per customer and value is mean spend by customers that were within that range. | JSON {String: Float} | { "1": 10000.10, "2": 31000.32, "3": 999.01, "4": 200, "5-10": 805.00, ">10": 90.89} |
🛡 bucketed_customer_incomes | The distribution of estimated customer incomes based on pre-specified buckets. Key is the range of customer income in dollars per year and value is number of customers that were within that range. Only includes keys where values are non-zero. See Customer Information. | JSON {String: Integer} | {“<25k”: 135, “25-45k”: 225, “45-60k”: 500, “60-75k”: 252, “75-100k”: 220, “100-150k”: 111, “>150k”: 12} |
mean_spend_per_customer_ by_income | Mean amount spent per customer at this POI based on pre-specified customer income buckets. Key is the range of customer income in dollars per year and values represent the mean spend by customers in that income range. Only includes keys where values are non-zero. | JSON {String: Float} | {“<25k”: 1700.10, “25-45k”: 2221.51, “45-60k”: 5000.00, “60-75k”: 2593.12, “75-100k”: 124.00, “100-150k”: 999.19, “>150k”: 120.25} |
🛡 customer_home_city | The number of customers to the POI based on the customer’s estimated home location. Homes are indicated by unique city and state pairs. See Customer Information. | JSON {String: Integer} | {“Palo Alto, CA”: 22, “Redwood City, CA”: 308, “Mountain View, CA”: 152, ...} |
🛡 We do not report data unless at least 2 customers are observed from that group. Differential privacy is also applied to these columns for further anonymization See more on privacy here.
Cross Shopping Columns
[New additional columns added in July 2022]
Column Name | Description | Type | Example |
---|---|---|---|
related_cross_shopping_ physical_brands_pct | Other brands that customers to this POI also spent money with, in-person, this month. The value is the percent of POI customers that spent money at the other brand in the same month. Limited to the Top 20 brands. | JSON {String: Integer} | {“Burger King”: 50, “McDonalds”: 7, “AMC”: 5, "Target":3, ...} |
related_cross_shopping_ online_merchants_pct | Other merchants that customers to this POI also spent money with, online, this month. The value is the percent of POI customers that spent money at the other merchant in the same month. Note a broader list of merchants is used here rather than only SafeGraph brands. Limited to the Top 20 merchants. | JSON {String: Integer} | {“Amazon”: 50,"Apple": 31,"Target": 20,"Spotify": 3, …} |
related_cross_shopping_ same_category_brands_pct | Same as related_cross_shopping_ physical_brands_pct but filtered only to brands within the same 4-digit naics_code . | JSON {String: Integer} | {“Burger King”: 50, "Mcdonalds": 7, “Shake Shack”: 2} |
related_cross_shopping_ local_brands_pct | Same as related_cross_shopping_ physical_brands_pct but filtered only to brands with matched transactions in the same zip code. | JSON {String: Integer} | {“Burger King”: 50} |
related_wireless_carrier_pct | Percent of customers that also spent money with specific wireless carriers during the month. | JSON {String: Integer} | {“Verizon Wireless”: 60, “AT&T”: 25, “T-Mobile”: 10 “Cricket Wireless”: 20} |
related_streaming_cable_pct | Percent of customers that also spent money on specific streaming or cable services during the month. | JSON {String: Integer} | {“Netflix”: 90, “YouTube”: 90, “DirecTV”: 30, “Amazon Prime Video”: 20…} |
related_delivery_service_pct | Percent of customers that also spent money on specific online delivery services during the month. | JSON {String: Integer} | {“Uber Eats”: 25, “Instacart”: 20, “Postmates”: 19, …} |
related_rideshare_service_pct | Percent of customers that also spent money on specific rideshare services during the month. | JSON {String: Integer} | {“Uber”: 75, “Lyft”: 30} |
related_buynowpaylater_ service_pct | Percent of customers that also spent money on specific Buy Now Pay Later (BNPL) services elsewhere during the month. | JSON {String: Integer} | {“Afterpay”: 10, “Sezzle”: 5} |
related_payment_platform_pct | Percent of customers that also used specific payment platforms elsewhere during the month. | JSON {String: Integer} | {“Venmo”:89, “Cash App”:50, “Apple Pay”: 24} |
See additional notes below on how brand and merchant names are handled in these columns and which carriers/services are allowable for related_wireless_carrier_pct
, related_streaming_cable_pct
, related_delivery_service_pct
, related_rideshare_service_pct
, related_buynowpaylater_service_pct
, and related_payment_platform_pct
columns.
Panel Overview Data
Along with the Spend file, we also deliver Panel Overview Data (see table below) to help you better understand the context of the data appearing in Spend.
Transaction and Customer Distributions by State
[transaction_panel_summary.csv]
Column Name | Description | Type | Example |
---|---|---|---|
date_range_start | Start time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:mm:SS. | String | 2020-03-01T00:00:00-00:00 |
date_range_end | End time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:mm:SS. | String | 2020-04-01T00:00:00-00:00 |
region | Uppercase abbreviation of U.S. state or territory. | String | NY |
total_transactions | Total transactions to all POIs reported in this dataset. | Integer | 28123456 |
total_customers | Total unique customers to all POIs reported in this dataset. | Integer | 15123456 |
transaction_type | The number of transactions in this dataset, based on the type of transaction. Key options are “bank” (referring to debit cards) and “card” (referring to credit cards). | JSON {String: Integer} | {“bank”: 12161232, “card”: 3514064} |
Key Concepts
Geographic Bias
Small geographic bias exists in our panel based on our understanding of the home locations of the customers in the panel. SafeGraph tested for geographic bias by comparing its determination of the state-by-state numbers of home location of the customers in the panel to the true proportions reported by the 2019 US Census. Based on that analysis, SafeGraph panel density closely mirrors true population density. The overall average percentage point difference is < 1% with a maximum of +/-4% per state. For a deep dive on geographic bias in the panel, see Quantifying Sampling Bias in SafeGraph Spend.
Correlations with Quarterly Revenue
When rolled up to the parent brand, SafeGraph Spend data can be compared against financial indicators of companies (eg. quarterly revenue). SafeGraph uses such tests as a benchmark even though the use cases of Spend are far more varied than aggregating by brand. Based on one such analysis, SafeGraph data track with quarterly revenue from major brands like McDonald's, Chipotle, and Target, including cases where companies report online sales separately than overall revenue (e.g., Chipotle). Read more about that analysis in our blog.
Date Granularity
- The underlying transaction data being aggregated are only resolvable at the daily level. Therefore, columns such as
date_range_start
anddate_range_end
that are provided down to the hour level are done so to facilitate consistent joining to other SafeGraph datasets, and not reflective of the actual granularity of the transaction timing. - Furthermore, whenever possible, the transaction dates used in
spend_by_day
,spend_per_transaction_by_day
,spend_by_day_of_week
reflect the date of the actual transaction. However, for some transactions, the date reported is instead the date processed by the financial institution, which is typically the next business day.- This means that Saturday and Sunday spend will appear lower in the data and Monday will be appear higher (i.e., Sat/Sun spend being attributed to Mon), but this only affects these three columns.
- Debit (a.k.a. bank) card transactions are also more likely than credit card transactions to have this bias, so weekend numbers are more likely to reflect credit card transactions.
- Note that we have provided a column called
day_counts
which is simply a count of how many of each day occurred in the given month (e.g., there were 4 Tuesdays in the month). You can use this column to determine whether an increase in spend in a given month is due to a real phenomena or due to the fact that there were more Mondays in the given month.
Online vs In-Person Transactions
- Prior to being aggregated to POIs, individual transactions are classified by origin as online or in-person based on a proprietary model leveraging information about the transaction, the merchant, the customer, and other factors.
- This allows us to understand what proportion of transactions attributed to a POI (and their corresponding spend) were made physically versus online.
- Certain POIs lend themselves more to online versus in-person transactions. For example, self-storage POIs are more likely to have online transactions where payment is not made at the physical location. On the other end of the spectrum, gas station POIs are more likely to have in-person transactions where payment is made at the physical location.
- Note that online transactions that cannot be tied to an individual physical location will not be included in columns such as
raw_total_spend
,raw_num_transactions
, etc. For example, purchases made online and shipped directly to a residence may not reference a specific store because they might be filled from a warehouse or distribution center. Whereas a "buy online, pick up in store" presents a connection to a physical store.- There is one exception to this general rule: transactions that cannot be tied to a physical location (whether online or offline) are included in cross-shopping columns (e.g.,
related_cross_shopping_physical_brands_pct
,related_cross_shopping_online_merchants_pct
, etc).
- There is one exception to this general rule: transactions that cannot be tied to a physical location (whether online or offline) are included in cross-shopping columns (e.g.,
Transaction Intermediaries
- Transaction intermediaries can be apps that facilitate the transaction between the POI and the customer (e.g.,
DoorDash
for restaurant POIs. - They can also be payment processors through which the transaction takes place (e.g.,
Apple Pay
) - Transactions can also have multiple intermediaries. Paying for a DoorDash order through Apple Pay would mean there would be a 1 in
Apple Pay
and also inDoorDash
. - There is also some nuance with specific values which show up in this column:
No Intermediary
does not mean that the transaction was via cash or anything like that. It means either no intermediary metadata was available and/or it was a direct bank or credit card charge.- Similarly,
Visa
as an intermediary does not mean they used a Visa card. Visa has a shared checkout option similar to Paypal, that's what the "Visa" intermediary means in that context. - Similarly with
Square
: mostly this means the store has a Square POS system, but there are Square intermediaries that aren't necessarily POS, e.g., Square Online, so "Square" would cover that payment processing method as well.
Customer Information
- Each customer in the panel is classified into an income class using a proprietary model based on his or her transactions and spending data.
- Similarly, each customer's home city and state are estimated using a proprietary model based on where the user makes the majority of their transactions.
- Note that we do not provide any individual-level data in this dataset, and these models are used solely for aggregating demographic information about customers to points of interest. A reminder that both of these columns are subject to differential privacy, implemented specifically to remove the possibility of identifying individuals with this data. See Privacy for more.
Brand and Merchant Keys in Cross Shopping Columns
- For all columns with
brands
in the column name, the keys in the JSON indicate SafeGraph brands with a corresponding entry in the Brand info file. See also Places > Brands. - All other cross shopping columns use a different set of names (distinguished from SafeGraph brands by referencing them as
merchants
rather thanbrands
) to accommodate entities without physical locations such asNetflix
andSpotify
.- Almost all of the time, these names are identical to the SafeGraph brand name where there is overlap; however, please note there are some notable differences (e.g.,
Apple
is the online merchant name whileApple Retail Store
is the corresponding SafeGraph brand).
- Almost all of the time, these names are identical to the SafeGraph brand name where there is overlap; however, please note there are some notable differences (e.g.,
Allowable Merchants in Certain Cross Shopping Columns
- Certain cross shopping columns also only allow a fixed set of merchants, as indicated by the table below. These merchants were selected based on popularity in the transaction dataset, meaning less ubiquitous brands are less likely to be included.
- If you think a certain merchant should be added to a particular column, please let us know.
Cross Shopping Column | Allowable Merchants |
---|---|
related_wireless_carrier_pct | Verizon Wireless, AT&T, T-Mobile, Sprint, Straight Talk, MetroPCS, Cricket Wireless, Boost Mobile, Disney Mobile, U.S. Cellular, Consumer Cellular Inc, Google Fi, TracFone Wireless |
related_streaming_cable_pct | Netflix, Hulu, Disney Plus, Amazon Prime, Youtube, YouTube TV, Roku, HBO Max, Vudu, Redbox, Dish Network, Youtube Premium, Peacock, DirecTV, Sling TV, Fubo, Amazon Prime Video, Time Warner Cable, Crunchyroll, Funimation, ESPN Plus, Starz |
related_delivery_service_pct | Uber Eats, Instacart, DoorDash, HelloFresh, Postmates, Grubhub, Shipt, Go Puff, Dashpass |
related_rideshare_service_pct | Uber, Lyft |
related_buynowpaylater_service_pct | Afterpay, Klarna, Affirm.com, Progressive Leasing, Sezzle, Zip Company, Acima |
related_payment_platform_pct | Venmo, PayPal, Zelle, Cash App, Apple Cash, Apple Card, Xoom, Visa Direct, Square Cash |
Privacy
To preserve privacy, we apply differential privacy techniques to the following columns: bucketed_customer_income
and customer_home_city
.
We have added Laplacian noise to the values in these columns. After adding noise, only attributes (e.g., a city) with at least two customers are included in the data. For these columns, we do not report data unless at least 2 visitors are observed from that group.
We take the added precaution of ensuring no city can appear in customer_home_city
if <4 panelists have that home city assigned. This is to prevent de-identifying panelists who come from rare or unique cities.
Updated almost 2 years ago