Digital marketing analysts around the globe reap the benefits of the more efficient and user-friendly Google Analytics 4 (GA4). But with all its benefits, it also poses some challenges to marketers and analysts.
Every once in a while, the appearance of an orange exclamation mark and missing data from their reports confuse them.
Hover the cursor over the sign, and they find that data thresholding has been applied. Shown below the sign is a rather crude form of the data instead of the exact figures.
Let us help you navigate GA4 thresholding and share some tricks to, as far as possible, circumvent the issue.
Defining Data Thresholding
As the name suggests, data thresholding is the inaccessibility of data in your reports and explorations when your website’s visitors fall below a minimum threshold.
What exactly a threshold is, is not precisely known. Likely less than 50 users warrant the limit. In such scenarios, Google withholds personally identifiable information (PII) such as location, demographics, and interests.
Instead of the exact numbers of visitors and their associated data, you get a range in which your data set falls.
Why Does GA4 Apply Data Thresholding?
Why would Google do that, you might ask? It all comes down to data privacy. Google Analytics doubled down on data privacy in its latest iteration GA4.
Websites are already required to seek user consent before using their data. Measures like pseudonymous identifiers, data isolation, and minimization are a few examples of GA4’s privacy measures.
Google believes that with low users; information like gender, age, interests, and preferences could enable websites to deduce the identity of the person behind the data.
So, as an additional measure, it holds back specific information, including the user demographics, if the user volume is low.
When the data meets all the criteria of the minimal standard, rows in the acquisition report or exploration appear blank. Instead, the analysts see an aggregate number and an orange exclamation mark within a triangle appear on the top right corner of the web page.
Take the cursor over to the symbol and it informs you that a data threshold has been applied to your report.
If you want to know more about how Google Analytics 4 improves data privacy, read it here.
Data Sampling vs. Data Threshold
Before we delve into the “why” and “how” of data thresholding, let us address one of the critical reasons thresholding is so frustrating.
Universal Analytics (UA) displayed something called sampling once you applied secondary dimensions to the standard report, given there was sufficient data to do so.
Now, the standard report in GA4 peruses 100% of the data regardless of the parameters and filters applied.
The only time UA gave you granular data was when your users exceeded the maximum limit of 500,000 sessions for a given date range. The cause here was a limitation with data processing.
Google Analytics 4 on the other hand only gives sampled data for advanced reports.
Data thresholding kicks in when the user data is smaller than the set standard.
Data Thresholding only means that the data that is deemed capable of compromising the identity of the users through inference is only being held back. It is still there in the database.
When Does GA4 Apply Data Thresholds?
You get all the incredible, actionable, demographic data thanks to Google signals. It allows you to track the users across devices and platforms, given it is enabled on their devices in their Google Accounts.
If you have enabled Google Signals on your GA4, and the user count falls below the threshold for a given date range, it will withhold the data.
Even after you disable it an initially enabled setting would prevent your access to the data for some time after disablement.
GA4 relies on reporting identities to calculate your website users. You have three identity options
- Device Based - basic, count users by their Device ID / App Instance (for mobile app)
- Observed - slightly advanced, combines User ID, cookie data & Google Signals
- Blended - most advanced; data from the former two plus modeling data to fill the gap
Device ID data is what your company owns. It is your property and the thresholding may not apply to this identity. But the remaining two come from Google’s data, where users allow it to collect their data.
For the observed ID, Google combines your data with its own to give you a better demographic picture. Deduplication allows it to identify the same user using various devices, counting users more accurately.
The blended ID relies on aspects from the other IDs and data modeling to predict the behavior or demographics of the users who do not enable Signals.
If the user flow is smaller than the threshold for second and third reporting IDs, data thresholding will apply.
Search Query Information
For a search query report or exploration, if the number of users falls short of the threshold, figures from the data row will not show.
Small Date Range
Since this is all about minimizing the risk of exposing user identity, if you pull up a report with a small date range, chances are high that the users will be low. Consequently, you will get crude data.
Implications for Small Websites
Data Thresholding helps protect the user data privacy.
However, for smaller websites with limited user flow, it could pose a problem. Unless they view reports in the device ID, most of their events might not even appear in their reports.
Working Around Data Thresholding
Data thresholding is inevitable in Google Analytics 4. Still, there are a few ways to get around it, albeit by compromising on some features.
Disable Google Signals
A lot of your thresholding woes stem from Signals. Disabling it can help resolve it after some time (since GA4 continues to implement thresholding for a while after disabling Signals).
On the downside, you no longer get the deduplication of data that recognizes a single user across devices, giving you a less-than-accurate number of users.
Switch Default Reporting Identity
Instead of turning off Signals, choose Device ID. This way, GA4 would rely on first-party cookies or app data for the analysis.
Doing so means Google isn’t providing you with its own data, and therefore thresholding does not apply.
Another great thing about this option is that you can easily switch back and forth between the reporting identities without compromising your output.
Adjust the Date Range
Another easy fix is to choose a bigger date range for your reporting or exploration. As the time increases, so does the data. Chances are, this bigger set would have a higher user count, and manage to bypass the thresholding.
Thresholding is built into GA4. There is nothing you can do to disable it and gain access to the completely analyzed data.
The suggested methods to circumvent the thresholding come with a price.
Using a larger date range, alternate IDs, or not enabling Signals also means the data is relatively raw and the number of users thus revealed may not be as accurate.
The crux is that Google prevents accessibility to refined user data if the event participants within a given date range are less than its minimal requirement, the data threshold.
For the thresholding to apply, you should be trying to access specific information such as demographics for an event or search query response with low visitors.
It poses analytical problems for the analyst, as they cannot understand their events better.
In such cases, you may choose to not enable or disable Google Signals, opt for a device-based reporting ID, or analyze for a bigger date range.
If you found this helpful then check out more interesting posts from our blog.