SafeGraph’s Spend dataset contains anonymized debit and credit card transaction data aggregated to individual places in the U.S.

Gain insights on how spend behavior is changing over time at specific points of interest (POI), including the average transaction size, spend in-person vs. online, spend by customer demographics such as income, and spend by customer loyalty.

This data is ideal for:

  • Competitor Analysis (e.g., In which markets are store sales growth outperforming competitors?)
  • Site Selection (e.g., What co-tenants drive sales volume for individual stores within a brand?)
  • Impact Measurement (e.g., Where did sales go up the most for our new product launch?)

and any other use-cases where dynamic spend insights at individual locations are paramount.

Spend is aggregated at a monthly time interval, and delivered on the 15th after each month's end.

Check out our Spend Summary Statistics page for detailed information on our coverage.

Contents:

Spend Schema

spend.csv [reference file name for enterprise deliveries]

Spend contains many columns which are of JSON type (e.g., bucketed_customer_frequency). Please see our FAQs page here for guidance on how to work with these columns.

Note that Spend is bundled with Places which provides address information, geo-coordinates, industry categorizations and more.

Column NameDescriptionTypeExample
placekeyUnique and persistent ID tied to this POI. See the Placekey Concept for details on placekey design.String[email protected]
safegraph_brand_idsUnique and consistent ID(s) that represents this specific brand.ListSG_BRAND_59dcabd7cd2395a2
brandsIf this POI is an instance of a larger brand that we have explicitly identified, this column will contain that brand name. See: brands.ListTarget
spend_date_range_startStart time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:MM:SS±hh:mm (local time with offset from GMT). The start time will be 12 a.m. Monday in local time. See Date Granularity.String2020-03-01T00:00:00-06:00
spend_date_range_endEnd time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:MM:SS±hh:mm (local time with offset from GMT). The end time will be the following Monday at 12 a.m. local time. See Date Granularity.String2020-04-01T00:00:00-06:00
raw_total_spendTotal amount spent at this POI in transactions captured by our panel during the date range.Float76050.12
raw_num_transactionsNumber of transactions at this POI captured by our panel during the date range.Integer1521
raw_num_customersNumber of unique customers with at least one transaction at this POI captured by our panel during the date range. POI rows with fewer than 4 customers in the time period are excluded.Integer435
median_spend_per_transactionMedian amount spent in each transaction at this POI.Float42.00
median_spend_per_customerMedian amount spent by each customer at this POI. This value takes into account customers that have made multiple transactions at this POI.Float125.83
spend_per_ transaction_percentilesThe 25th and 75th percentiles of spend_per_transaction at this POI.JSON {String: Float}{“25”: 23.11, “75”: 80.99}
spend_by_dayTotal amount spent at this POI each day over the covered time period. See Date Granularity.JSON [Float][2535.34, 5214.11, … ]
spend_per_transaction_by_dayMedian transaction size at this POI each day over the covered time period. Values will be null for days with no transactions. See Date Granularity.JSON [Float][20.33, 70.22, … ]
spend_by_day_of_weekTotal amount spent at this POI on each day of the week over the covered time period. See Date Granularity.JSON {String: Float}{“Monday”: 10864.11, “Tuesday”: 15200.10, … }
day_countsThe number of times each day of the week (e.g., Monday, Tuesday, etc.) occurred in the measurement period. See Date Granularity for why we include this.JSON {String: Integer}{"Monday": 4, "Tuesday": 5, "Wednesday": 5, "Thursday": 5, "Friday": 4, "Saturday": 4, "Sunday": 4 }
spend_pct_change_ vs_prev_monthPercent difference between last month’s raw_total_spend and this month’s. Value will be null where reference month does not exist.Integer5
spend_pct_change_ vs_prev_yearPercent difference between last year’s same-month raw_total_spend and this month’s. Value will be null where reference month does not exist.Integer-10
online_transactionsThe number of online transactions at this POI during the date range. The remaining transactions were in-person. See Online vs In-person Transactions.Integer310
online_spendThe amount spent at this POI through online methods during the date range. The remaining spend was in-person. See Online vs In-person Transactions.Float7512.22
transaction_intermediaryThe number of transactions at this POI based on the intermediary through which the transaction was made, if any. See Transaction IntermediariesJSON {String: Integer}{“No Intermediary”: 900, "Apple Pay": 215, "DoorDash": 155, "Square": 32}
spend_by_ transaction_intermediaryTotal amount spent among transactions by intermediary, including no intermediary. For each POI, will have the same keys as transaction_intermediary.JSON {String: Float}{“No Intermediary”: 10400.12, "Apple Pay": 2015.00, "DoorDash": 1502.33, "Square": 320.00}
bucketed_customer_frequencyThe distribution of customer repeat frequencies based on pre-specified buckets. Key is the number of transactions per customer and value is the number of customers that were within that range.JSON {String: Integer}{ "1": 500, "2": 302, "3": 101, "4": 20, "5-10": 90, ">10": 5}
mean_spend_per_customer_ by_frequencyMean amount spent per customer at this POI based on customer frequency. Key is the number of transactions per customer and value is mean spend by customers that were within that range.JSON {String: Float}{ "1": 10000.10, "2": 31000.32, "3": 999.01, "4": 200, "5-10": 805.00, ">10": 90.89}
🛡 bucketed_customer_incomesThe distribution of estimated customer incomes based on pre-specified buckets. Key is the range of customer income in dollars per year and value is number of customers that were within that range. Only includes keys where values are non-zero. See Customer Information.JSON {String: Integer}{“<25k”: 135, “25-45k”: 225, “45-60k”: 500, “60-75k”: 252, “75-100k”: 220, “100-150k”: 111, “>150k”: 12}
mean_spend_per_customer_ by_incomeMean amount spent per customer at this POI based on pre-specified customer income buckets. Key is the range of customer income in dollars per year and values represent the mean spend by customers in that income range. Only includes keys where values are non-zero.JSON {String: Float}{“<25k”: 1700.10, “25-45k”: 2221.51, “45-60k”: 5000.00, “60-75k”: 2593.12, “75-100k”: 124.00, “100-150k”: 999.19, “>150k”: 120.25}
🛡 customer_home_cityThe number of customers to the POI based on the customer’s estimated home location. Homes are indicated by unique city and state pairs. See Customer Information.JSON {String: Integer}{“Palo Alto, CA”: 22, “Redwood City, CA”: 308, “Mountain View, CA”: 152, ...}

🛡 We do not report data unless at least 2 customers are observed from that group. Differential privacy is also applied to these columns for further anonymization See more on privacy here.

Cross Shopping Columns

[New additional columns added in July 2022]

Column NameDescriptionTypeExample
related_cross_shopping_ physical_brands_pctOther brands that customers to this POI also spent money with, in-person, this month. The value is the percent of POI customers that spent money at the other brand in the same month. Limited to the Top 20 brands.JSON {String: Integer}{“Burger King”: 50, “McDonalds”: 7, “AMC”: 5, "Target":3, ...}
related_cross_shopping_ online_merchants_pctOther merchants that customers to this POI also spent money with, online, this month. The value is the percent of POI customers that spent money at the other merchant in the same month. Note a broader list of merchants is used here rather than only SafeGraph brands. Limited to the Top 20 merchants.JSON {String: Integer}{“Amazon”: 50,"Apple": 31,"Target": 20,"Spotify": 3, …}
related_cross_shopping_ same_category_brands_pctSame as related_cross_shopping_ physical_brands_pct but filtered only to brands within the same 4-digit naics_code.JSON {String: Integer}{“Burger King”: 50, "Mcdonalds": 7, “Shake Shack”: 2}
related_cross_shopping_ local_brands_pctSame as related_cross_shopping_ physical_brands_pct but filtered only to brands with matched transactions in the same zip code.JSON {String: Integer}{“Burger King”: 50}
related_wireless_carrier_pctPercent of customers that also spent money with specific wireless carriers during the month.JSON {String: Integer}{“Verizon Wireless”: 60, “AT&T”: 25, “T-Mobile”: 10 “Cricket Wireless”: 20}
related_streaming_cable_pctPercent of customers that also spent money on specific streaming or cable services during the month.JSON {String: Integer}{“Netflix”: 90, “YouTube”: 90, “DirecTV”: 30, “Amazon Prime Video”: 20…}
related_delivery_service_pctPercent of customers that also spent money on specific online delivery services during the month.JSON {String: Integer}{“Uber Eats”: 25, “Instacart”: 20, “Postmates”: 19, …}
related_rideshare_service_pctPercent of customers that also spent money on specific rideshare services during the month.JSON {String: Integer}{“Uber”: 75, “Lyft”: 30}
related_buynowpaylater_ service_pctPercent of customers that also spent money on specific Buy Now Pay Later (BNPL) services elsewhere during the month.JSON {String: Integer}{“Afterpay”: 10, “Sezzle”: 5}
related_payment_platform_pctPercent of customers that also used specific payment platforms elsewhere during the month.JSON {String: Integer}{“Venmo”:89, “Cash App”:50, “Apple Pay”: 24}

See additional notes below on how brand and merchant names are handled in these columns and which carriers/services are allowable for related_wireless_carrier_pct, related_streaming_cable_pct, related_delivery_service_pct, related_rideshare_service_pct, related_buynowpaylater_service_pct, and related_payment_platform_pct columns.

Panel Overview Data

Along with the Spend file, we also deliver Panel Overview Data (see table below) to help you better understand the context of the data appearing in Spend.

Transaction and Customer Distributions by State

[transaction_panel_summary.csv]

Column NameDescriptionTypeExample
date_range_startStart time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:mm:SS.String2020-03-01T00:00:00-00:00
date_range_endEnd time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:mm:SS.String2020-04-01T00:00:00-00:00
regionUppercase abbreviation of U.S. state or territory.StringNY
total_transactionsTotal transactions to all POIs reported in this dataset.Integer28123456
total_customersTotal unique customers to all POIs reported in this dataset.Integer15123456
transaction_typeThe number of transactions in this dataset, based on the type of transaction. Key options are “bank” (referring to debit cards) and “card” (referring to credit cards).JSON {String: Integer}{“bank”: 12161232, “card”: 3514064}

Key Concepts

Geographic Bias

Small geographic bias exists in our panel based on our understanding of the home locations of the customers in the panel. SafeGraph tested for geographic bias by comparing its determination of the state-by-state numbers of home location of the customers in the panel to the true proportions reported by the 2019 US Census. Based on that analysis, SafeGraph panel density closely mirrors true population density. The overall average percentage point difference is < 1% with a maximum of +/-4% per state. For a deep dive on geographic bias in the panel, see Quantifying Sampling Bias in SafeGraph Spend.

Correlations with Quarterly Revenue

When rolled up to the parent brand, SafeGraph Spend data can be compared against financial indicators of companies (eg. quarterly revenue). SafeGraph uses such tests as a benchmark even though the use cases of Spend are far more varied than aggregating by brand. Based on one such analysis, SafeGraph data track with quarterly revenue from major brands like McDonald's, Chipotle, and Target, including cases where companies report online sales separately than overall revenue (e.g., Chipotle). Read more about that analysis in our blog.

Date Granularity

  • The underlying transaction data being aggregated are only resolvable at the daily level. Therefore, columns such as date_range_start and date_range_end that are provided down to the hour level are done so to facilitate consistent joining to other SafeGraph datasets, and not reflective of the actual granularity of the transaction timing.
  • Furthermore, whenever possible, the transaction dates used in spend_by_day, spend_per_transaction_by_day, spend_by_day_of_week reflect the date of the actual transaction. However, for some transactions, the date reported is instead the date processed by the financial institution, which is typically the next business day.
    • This means that Saturday and Sunday spend will appear lower in the data and Monday will be appear higher (i.e., Sat/Sun spend being attributed to Mon), but this only affects these three columns.
    • Debit (a.k.a. bank) card transactions are also more likely than credit card transactions to have this bias, so weekend numbers are more likely to reflect credit card transactions.
    • Note that we have provided a column called day_counts which is simply a count of how many of each day occurred in the given month (e.g., there were 4 Tuesdays in the month). You can use this column to determine whether an increase in spend in a given month is due to a real phenomena or due to the fact that there were more Mondays in the given month.

Online vs In-Person Transactions

  • Prior to being aggregated to POIs, individual transactions are classified by origin as online or in-person based on a proprietary model leveraging information about the transaction, the merchant, the customer, and other factors.
  • This allows us to understand what proportion of transactions attributed to a POI (and their corresponding spend) were made physically versus online.
  • Certain POIs lend themselves more to online versus in-person transactions. For example, self-storage POIs are more likely to have online transactions where payment is not made at the physical location. On the other end of the spectrum, gas station POIs are more likely to have in-person transactions where payment is made at the physical location.
  • Note that online transactions that cannot be tied to an individual physical location will not be included in columns such as raw_total_spend, raw_num_transactions, etc. For example, purchases made online and shipped directly to a residence may not reference a specific store because they might be filled from a warehouse or distribution center. Whereas a "buy online, pick up in store" presents a connection to a physical store.
    • There is one exception to this general rule: transactions that cannot be tied to a physical location (whether online or offline) are included in cross-shopping columns (e.g., related_cross_shopping_physical_brands_pct, related_cross_shopping_online_merchants_pct, etc).

Transaction Intermediaries

  • Transaction intermediaries can be apps that facilitate the transaction between the POI and the customer (e.g., DoorDash for restaurant POIs.
  • They can also be payment processors through which the transaction takes place (e.g., Apple Pay)
  • Transactions can also have multiple intermediaries. Paying for a DoorDash order through Apple Pay would mean there would be a 1 in Apple Pay and also in DoorDash.
  • There is also some nuance with specific values which show up in this column:
    • No Intermediary does not mean that the transaction was via cash or anything like that. It means either no intermediary metadata was available and/or it was a direct bank or credit card charge.
    • Similarly, Visa as an intermediary does not mean they used a Visa card. Visa has a shared checkout option similar to Paypal, that's what the "Visa" intermediary means in that context.
    • Similarly with Square: mostly this means the store has a Square POS system, but there are Square intermediaries that aren't necessarily POS, e.g., Square Online, so "Square" would cover that payment processing method as well.

Customer Information

  • Each customer in the panel is classified into an income class using a proprietary model based on his or her transactions and spending data.
  • Similarly, each customer's home city and state are estimated using a proprietary model based on where the user makes the majority of their transactions.
  • Note that we do not provide any individual-level data in this dataset, and these models are used solely for aggregating demographic information about customers to points of interest. A reminder that both of these columns are subject to differential privacy, implemented specifically to remove the possibility of identifying individuals with this data. See Privacy for more.

Brand and Merchant Keys in Cross Shopping Columns

  • For all columns with brands in the column name, the keys in the JSON indicate SafeGraph brands with a corresponding entry in the Brand info file. See also Places > Brands.
  • All other cross shopping columns use a different set of names (distinguished from SafeGraph brands by referencing them as merchants rather than brands) to accommodate entities without physical locations such as Netflix and Spotify.
    • Almost all of the time, these names are identical to the SafeGraph brand name where there is overlap; however, please note there are some notable differences (e.g., Apple is the online merchant name while Apple Retail Store is the corresponding SafeGraph brand).

Allowable Merchants in Certain Cross Shopping Columns

  • Certain cross shopping columns also only allow a fixed set of merchants, as indicated by the table below. These merchants were selected based on popularity in the transaction dataset, meaning less ubiquitous brands are less likely to be included.
  • If you think a certain merchant should be added to a particular column, please let us know.

Cross Shopping Column

Allowable Merchants

related_wireless_carrier_pct

Verizon Wireless, AT&T, T-Mobile, Sprint, Straight Talk, MetroPCS, Cricket Wireless, Boost Mobile, Disney Mobile, U.S. Cellular, Consumer Cellular Inc, Google Fi, TracFone Wireless

related_streaming_cable_pct

Netflix, Hulu, Disney Plus, Amazon Prime, Youtube, YouTube TV, Roku, HBO Max, Vudu, Redbox, Dish Network, Youtube Premium, Peacock, DirecTV, Sling TV, Fubo, Amazon Prime Video, Time Warner Cable, Crunchyroll, Funimation, ESPN Plus, Starz

related_delivery_service_pct

Uber Eats, Instacart, DoorDash, HelloFresh, Postmates, Grubhub, Shipt, Go Puff, Dashpass

related_rideshare_service_pct

Uber, Lyft

related_buynowpaylater_service_pct

Afterpay, Klarna, Affirm.com, Progressive Leasing, Sezzle, Zip Company, Acima

related_payment_platform_pct

Venmo, PayPal, Zelle, Cash App, Apple Cash, Apple Card, Xoom, Visa Direct, Square Cash

Privacy

To preserve privacy, we apply differential privacy techniques to the following columns: bucketed_customer_income and customer_home_city.

We have added Laplacian noise to the values in these columns. After adding noise, only attributes (e.g., a city) with at least two customers are included in the data. For these columns, we do not report data unless at least 2 visitors are observed from that group.

We take the added precaution of ensuring no city can appear in customer_home_city if <4 panelists have that home city assigned. This is to prevent de-identifying panelists who come from rare or unique cities.


What’s Next