SafeGraph’s Spend dataset contains anonymized debit and credit card transaction data aggregated to individual places in the U.S.

Gain insights on how spend behavior is changing over time at specific points of interest (POI), including the average transaction size, spend in-person vs. online, spend by customer demographics such as income, and spend by customer loyalty.

This data is ideal for:

  • Competitor Analysis (e.g., In which markets are store sales growth outperforming competitors?)
  • Site Selection (e.g., What co-tenants drive sales volume for individual stores within a brand?)
  • Impact Measurement (e.g., Where did sales go up the most for our new product launch?)

and any other use-cases where dynamic spend insights at individual locations are paramount.

Spend is aggregated at a monthly time interval, and delivered on the 15th after each month's end.

Check out our Spend Summary Statistics page for detailed information on our coverage.

Contents:

Spend Schema

[spend.csv]

Spend contains many columns which are of JSON type (e.g., bucketed_customer_frequency). Please see our FAQs page here for guidance on how to work with these columns.

Note that Spend is bundled with Core Places which provides address information, geo-coordinates, industry categorizations and more.

Column NameDescriptionTypeExample
placekeyUnique and persistent ID tied to this POI. See the Placekey Concept for details on placekey design.String[email protected]
safegraph_brand_idsUnique and consistent ID(s) that represents this specific brand.ListSG_BRAND_59dcabd7cd2395a2
brandsIf this POI is an instance of a larger brand that we have explicitly identified, this column will contain that brand name. See: brands.ListTarget
spend_date_range_startStart time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:MM:SS±hh:mm (local time with offset from GMT). The start time will be 12 a.m. Monday in local time. See Date Granularity.String2020-03-01T00:00:00-06:00
spend_date_range_endEnd time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:MM:SS±hh:mm (local time with offset from GMT). The end time will be the following Monday at 12 a.m. local time. See Date Granularity.String2020-04-01T00:00:00-06:00
raw_total_spendTotal amount spent at this POI in transactions captured by our panel during the date range.Float76050.12
raw_num_transactionsNumber of transactions at this POI captured by our panel during the date range.Integer1521
raw_num_customersNumber of unique customers with at least one transaction at this POI captured by our panel during the date range. POI rows with fewer than 4 customers in the time period are excluded.Integer435
median_spend_per_transactionMedian amount spent in each transaction at this POI.Float50.00
median_spend_per_customerMedian amount spent by each customer at this POI.Float174.83
spend_per_ transaction_percentilesThe 25th and 75th percentiles of spend_per_transaction at this POI.JSON {String: Float}{“25”: 23.11, “75”: 80.99}
spend_by_dayTotal amount spent at this POI each day over the covered time period. See Date Granularity.JSON [Float][2535.34, 5214.11, … ]
spend_per_transaction_by_dayMedian transaction size at this POI each day over the covered time period. Values will be null for days with no transactions. See Date Granularity.JSON [Float][20.33, 70.22, … ]
spend_by_day_of_weekTotal amount spent at this POI on each day of the week over the covered time period. See Date Granularity.JSON {String: Float}{“Monday”: 10864.11, “Tuesday”: 15200.10, … }
day_countsThe number of times each day of the week (e.g., Monday, Tuesday, etc.) occurred in the measurement period. See Date Granularity for why we include this.JSON {String: Integer}{"Monday": 4, "Tuesday": 5, "Wednesday": 5, "Thursday": 5, "Friday": 4, "Saturday": 4, "Sunday": 4 }
spend_pct_change_ vs_prev_monthPercent difference between last month’s raw_total_spend and this month’s. Value will be null where reference month does not exist.Integer5
spend_pct_change_ vs_prev_yearPercent difference between last year’s same-month raw_total_spend and this month’s. Value will be null where reference month does not exist.Integer-10
online_transactionsThe number of online transactions at this POI during the date range. The remaining transactions were in-person. See Online vs In-person Transactions.Integer310
online_spendThe amount spent at this POI through online methods during the date range. The remaining spend was in-person. See Online vs In-person Transactions.Float7512.22
transaction_intermediaryThe number of transactions at this POI based on the intermediary through which the transaction was made, if any. Transactions can have multiple intermediaries, in which case the number of transactions will be incremented for each intermediary.JSON {String: Integer}{“No Intermediary”: 900, "Apple Pay": 215, "DoorDash": 155, "Square": 32}
spend_by_ transaction_intermediaryTotal amount spent among transactions by intermediary, including no intermediary. For each POI, will have the same keys as transaction_intermediary.JSON {String: Float}{“No Intermediary”: 10400.12, "Apple Pay": 2015.00, "DoorDash": 1502.33, "Square": 320.00}
bucketed_customer_frequencyThe distribution of customer repeat frequencies based on pre-specified buckets. Key is the number of transactions per customer and value is the number of customers that were within that range.JSON {String: Integer}{ "1": 500, "2": 302, "3": 101, "4": 20, "5-10": 90, ">10": 5}
mean_spend_per_customer_ by_frequencyMean amount spent per customer at this POI based on customer frequency. Key is the number of transactions per customer and value is mean spend by customers that were within that range.JSON {String: Float}{ "1": 10000.10, "2": 31000.32, "3": 999.01, "4": 200, "5-10": 805.00, ">10": 90.89}
🛡 bucketed_customer_incomesThe distribution of estimated customer incomes based on pre-specified buckets. Key is the range of customer income in dollars per year and value is number of customers that were within that range. Only includes keys where values are non-zero. See Customer Information.JSON {String: Integer}{“<25k”: 135, “25-45k”: 225, “45-60k”: 500, “60-75k”: 252, “75-100k”: 220, “100-150k”: 111, “>150k”: 12}
mean_spend_per_customer_ by_incomeMean amount spent per customer at this POI based on pre-specified customer income buckets. Key is the range of customer income in dollars per year and values represent the mean spend by customers in that income range. Only includes keys where values are non-zero.JSON {String: Float}{“<25k”: 1700.10, “25-45k”: 2221.51, “45-60k”: 5000.00, “60-75k”: 2593.12, “75-100k”: 124.00, “100-150k”: 999.19, “>150k”: 120.25}
🛡 customer_home_cityThe number of customers to the POI based on the customer’s estimated home location. Homes are indicated by unique city and state pairs. See Customer Information.JSON {String: Integer}{“Palo Alto, CA”: 22, “Redwood City, CA”: 308, “Mountain View, CA”: 152, ...}

🛡 We do not report data unless at least 2 customers are observed from that group. Differential privacy is also applied to these columns for further anonymization See more on privacy here.

Panel Overview Data

Along with the Spend file, we also deliver Panel Overview Data (see table below) to help you better understand the context of the data appearing in Spend.

Transaction and Customer Distributions by State

[transaction_panel_summary.csv]

Column NameDescriptionTypeExample
date_range_startStart time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:mm:SS.String2020-03-01T00:00:00-00:00
date_range_endEnd time for measurement period in ISO 8601 format of YYYY-MM-DDTHH:mm:SS.String2020-04-01T00:00:00-00:00
regionUppercase abbreviation of U.S. state or territory.StringNY
total_transactionsTotal transactions to all POIs reported in this dataset.Integer28123456
total_customersTotal unique customers to all POIs reported in this dataset.Integer15123456
transaction_typeThe number of transactions in this dataset, based on the type of transaction. Key options are “bank” (referring to debit cards) and “card” (referring to credit cards).JSON {String: Integer}{“bank”: 12161232, “card”: 3514064}

Key Concepts

Geographic Bias

Small geographic bias exists in our panel based on our understanding of the home locations of the customers in the panel. SafeGraph tested for geographic bias by comparing its determination of the state-by-state numbers of home location of the customers in the panel to the true proportions reported by the 2019 US Census. Based on that analysis, SafeGraph panel density closely mirrors true population density. The overall average percentage point difference is < 1% with a maximum of +/-4% per state. For a deep dive on geographic bias in the panel, see Quantifying Sampling Bias in SafeGraph Spend.

Correlations with Quarterly Revenue

When rolled up to the parent brand, SafeGraph Spend data can be compared against financial indicators of companies (eg. quarterly revenue). SafeGraph uses such tests as a benchmark even though the use cases of Spend are far more varied than aggregating by brand. Based on one such analysis, SafeGraph data track with quarterly revenue from major brands like McDonald's, Chipotle, and Target, including cases where companies report online sales separately than overall revenue (e.g., Chipotle). Read more about that analysis in our blog.

Date Granularity

  • The underlying transaction data being aggregated are only resolvable at the daily level. Therefore, columns such as date_range_start and date_range_end that are provided down to the hour level are done so to facilitate consistent joining to other SafeGraph datasets, and not reflective of the actual granularity of the transaction timing.
  • Furthermore, whenever possible, the transaction dates used in spend_by_day, spend_per_transaction_by_day, spend_by_day_of_week reflect the date of the actual transaction. However, for some transactions, the date reported is instead the date processed by the financial institution, which is typically the next business day.
    • This means that Saturday and Sunday spend will appear lower in the data and Monday will be appear higher (i.e., Sat/Sun spend being attributed to Mon), but this only affects these three columns.
    • Debit (a.k.a. bank) card transactions are also more likely than credit card transactions to have this bias, so weekend numbers are more likely to reflect credit card transactions.
    • Note that we have provided a column called day_counts which is simply a count of how many of each day occurred in the given month (e.g., there were 4 Tuesdays in the month). You can use this column to determine whether an increase in spend in a given month is due to a real phenomena or due to the fact that there were more Mondays in the given month.

Online vs In-Person Transactions

  • Prior to being aggregated to POIs, individual transactions are classified by origin as online or in-person based on a proprietary model leveraging information about the transaction, the merchant, the customer, and other factors.
  • This allows us to understand what proportion of transactions attributed to a POI (and their corresponding spend) were made physically versus online.
  • Certain POIs lend themselves more to online versus in-person transactions. For example, self-storage POIs are more likely to have online transactions where payment is not made at the physical location. On the other end of the spectrum, gas station POIs are more likely to have in-person transactions where payment is made at the physical location.
  • Note that online transactions that cannot be tied to an individual physical location will not be included in the calculations. For example, purchases made online and shipped directly to a residence may not reference a specific store because they might be filled from a warehouse or distribution center. Whereas a "buy online, pick up in store" presents a connection to a physical store.

Customer Information

  • Each customer in the panel is classified into an income class using a proprietary model based on his or her transactions and spending data.
  • Similarly, each customer's home city and state are estimated using a proprietary model based on where the user makes the majority of their transactions.
  • Note that we do not provide any individual-level data in this dataset, and these models are used solely for aggregating demographic information about customers to points of interest. A reminder that both of these columns are subject to differential privacy, implemented specifically to remove the possibility of identifying individuals with this data. See Privacy for more.

Privacy

To preserve privacy, we apply differential privacy techniques to the following columns: bucketed_customer_income and customer_home_city.

We have added Laplacian noise to the values in these columns. After adding noise, only attributes (e.g., a city) with at least two customers are included in the data. For these columns, we do not report data unless at least 2 visitors are observed from that group.

We take the added precaution of ensuring no city can appear in customer_home_city if <4 panelists have that home city assigned. This is to prevent de-identifying panelists who come from rare or unique cities.


Did this page help you?