What are Data Thresholds in GA4 and How to Remove Them?

What are Data Thresholds in GA4 and How to Remove Them?

Data thresholds in the GA4 reports serve to protect the privacy of users by withholding customer data if the user flow is below the minimal data size.

By: Mussarat Nosheen | 5 mins read
Published: Sep 4, 2023 9:06:32 AM | Updated: Jul 24, 2024 11:22:46 AM

 

Digital marketing analysts around the globe reap the benefits of the more efficient and user-friendly Google Analytics 4. But with all its benefits, it also poses some challenges to marketers and analysts.

Every once in a while, the appearance of an orange exclamation mark and missing data from their reports confuse them.

Hover the cursor over the sign, and they find that data thresholding has been applied. Shown below the sign is a rather crude form of the data instead of the exact figures.

Frustrating, right?

Let us help you navigate GA4 thresholding and share some tricks to, as far as possible, circumvent the issue.

What is Data Thresholding in GA4?

As the name suggests, data thresholding is the inaccessibility of data in your reports and explorations when your website’s visitors fall below a minimum threshold.

What exactly a threshold is, is not precisely known. Likely less than 50 users warrant the limit.  In such scenarios, Google withholds personally identifiable information (PII) such as location, demographics, and interests.

Instead of the exact numbers of visitors and their associated data, you get a range in which your data set falls.

Get in touch to learn about Analytico’s  Digital Analytics Audit services or GA4 audit services.

Why Does Data Thresholding Apply in GA4?

Why would Google do that, you might ask? It all comes down to data privacy. Google Analytics doubled down on data privacy in its latest iteration GA4.

Websites are already required to seek user consent before using their data. Measures like pseudonymous identifiers, data isolation, and minimization are a few examples of GA4’s privacy measures.

Untitled design-1Why Does GA4 Apply Data Thresholding?

Google believes that with low users; information like gender, age, interests, and preferences could enable websites to deduce the identity of the person behind the data.

So, as an additional measure, it holds back specific information, including the user demographics, if the user volume is low.

When the data meets all the criteria of the minimal standard, rows in the acquisition report or exploration appear blank. Instead, the analysts see an aggregate number and an orange exclamation mark within a triangle appear on the top right corner of the web page.

Take the cursor over to the symbol and it informs you that a data threshold has been applied to your report.

If you want to know more about how Google Analytics 4 improves data privacy, read it here.

Data Sampling vs. Data Threshold

Before we delve into the “why” and “how” of data thresholding, let us address one of the critical reasons thresholding is so frustrating.

Universal Analytics (UA) displayed something called sampling once you applied secondary dimensions to the standard report, given there was sufficient data to do so.

Data Sampling vs. Data Threshold

Now, the standard report in GA4 peruses 100% of the data regardless of the parameters and filters applied.

The only time UA gave you granular data was when your users exceeded the maximum limit of 500,000 sessions for a given date range. The cause here was a limitation with data processing.

Google Analytics 4 on the other hand only gives sampled data for advanced reports.

Data thresholding kicks in when the user data is smaller than the set standard.

Remember

Data Thresholding only means that the data that is deemed capable of compromising the identity of the users through inference is only being held back. It is still there in the database. 

When Does GA4 Apply Data Thresholds?

Google Signals

You get all the incredible, actionable, demographic data thanks to Google signals. It allows you to track the users across devices and platforms, given it is enabled on their devices in their Google Accounts.

When Does GA4 Apply Data Thresholds?

If you have enabled Google Signals on your GA4, and the user count falls below the threshold for a given date range, it will withhold the data.

Even after you disable it an initially enabled setting would prevent your access to the data for some time after disablement.

* Google Signals was removed as a reporting identity on Feb 12, 2024, but it remains available for creating audiences in GA4. To learn more about the implications of Google Signal's removal as a reporting ID read this blog. 

Demographic Information

GA4 relies on reporting identities to calculate your website users. You have three identity options

  1. Device Based - basic, count users by their Device ID / App Instance (for mobile app)
  2. Observed - slightly advanced, combines User ID, cookie data & Google Signals
  3. Blended - most advanced; data from the former two plus modeling data to fill the gap

Device ID data is what your company owns. It is your property and the thresholding may not apply to this identity. But the remaining two come from Google’s data, where users allow it to collect their data.

For the observed ID, Google combines your data with its own to give you a better demographic picture. Deduplication allows it to identify the same user using various devices, counting users more accurately.

The blended ID relies on aspects from the other IDs and data modeling to predict the behavior or demographics of the users who do not enable Signals.

If the user flow is smaller than the threshold for second and third reporting IDs, data thresholding will apply.

Search Query Information

For a search query report or exploration, if the number of users falls short of the threshold, figures from the data row will not show.

Small Date Range

Since this is all about minimizing the risk of exposing user identity, if you pull up a report with a small date range, chances are high that the users will be low. Consequently, you will get crude data.

Implications for Small Websites

Data Thresholding helps protect the user data privacy.

However, for smaller websites with limited user flow, it could pose a problem. Unless they view reports in the device ID, most of their events might not even appear in their reports. 

How to Resolve Data Thresholding in GA4?

Data thresholding is inevitable in Google Analytics 4.

Data Sampling (2)

Still, there are a few ways to get around it, albeit by compromising on some features.

Disable Google Signals

A lot of your thresholding woes stem from Signals. Disabling it can help resolve it after some time (since GA4 continues to implement thresholding for a while after disabling Signals).

On the downside, you no longer get the deduplication of data that recognizes a single user across devices, giving you a less-than-accurate number of users.

Switch Default Reporting Identity

Instead of turning off Signals, choose Device ID. This way, GA4 would rely on first-party cookies or app data for the analysis.

Doing so means Google isn’t providing you with its own data, and therefore thresholding does not apply.

Another great thing about this option is that you can easily switch back and forth between the reporting identities without compromising your output.

Adjust the Date Range

Another easy fix is to choose a bigger date range for your reporting or exploration. As the time increases, so does the data. Chances are, this bigger set would have a higher user count, and manage to bypass the thresholding.

Remember

Thresholding is built into GA4. There is nothing you can do to disable it and gain access to the completely analyzed data.

The suggested methods to circumvent the thresholding come with a price.

Using a larger date range, alternate IDs, or not enabling Signals also means the data is relatively raw and the number of users thus revealed may not be as accurate. 

Conclusion

The crux is that Google prevents accessibility to refined user data if the event participants within a given date range are less than its minimal requirement, the data threshold.

For the thresholding to apply, you should be trying to access specific information such as demographics for an event or search query response with low visitors.

It poses analytical problems for the analyst, as they cannot understand their events better.

In such cases, you may choose to not enable or disable Google Signals, opt for a device-based reporting ID, or analyze for a bigger date range.

If you found this helpful then check out more interesting posts from our blog.