The Google Analytics 4 API is a little different than the Google Analytics 3 (Universal Analytics) API. I’ve walked us through how to pull a basic report from the Universal Analytics API previously and, although the steps are similar, there are slightly different request bodies, a different way to make the request, and a different way to parse the data.
This article is designed not only for beginners but also data scientists and data analysts who are making the switch from Universal Analytics to Google Analytics 4. The GA4 API documentation has a Python guide. This will not follow that. This article will be more aligned with the OAuth2 flow you are likely to be accustomed to with Universal Analytics.
Requirements
A Google Analytics 4 API Connection. I’ll be calling the function we wrote to build the service and run the report.
pip install pandas
Google Analytics 4 API batchRunReports Method
The GA4 v1beta and likely v1 API has a handful of reports available to run. This, already, is a little different than the GA3 v4 API where there is really only one reporting method, batchGet
, which is what I covered in the UA API article. In the GA4 API, batchRunReports
is the closest method to what you’ll be used to from GA3 v4. There is also a singular runReport
method. I wouldn’t be surprised to see these consolidated at some point in the future. GA4 can also run Pivot Reports and Realtime Reports. Those are beyond the scope of this article.
In the Universal Analytics API report article I wrote, I gave some tips at the end to make the request setup and report parsing easier and more scalable. I’m going to skip straight to using those tips here. You can brush up on those tips in the final section of the Universal Analytics API report article.
Additional Imports
In addition to the modules we imported in the GA4 API Connection article, we’re going to import two more to help us parse the data.
import pandas as pd
from collections import defaultdict
Find Your GA4 Property ID
Unlike Universal Analytics, GA4 does not have views. If you followed along with my GA3 API Report article, you saw that we needed a viewId
for our request body. What we need, instead, is the GA4 Property ID. You can find that by navigating to the admin section of your Google Analytics account and looking at your Property Settings. Your Property ID will be in the top right.
We won’t however, be putting it in the request body. It goes somewhere else and we’ll come back to it at the end. You’re welcome to save the property ID as a variable. One weird thing about the way we have to supply the property ID in the API is that it needs to be preceded with properties/
. We’ll be very original and save it as property_id
.
property_id = 'properties/306503726'
Add Dimensions & Metrics to Lists
If you used the GA3 API for reporting, forget all the metrics and dimensions you know; they go by a new name in GA4. You can find all of the built-in dimension and metric names in the API schema and you can also find a UA-to-GA4 equivalence in their migration docs. As a third option, you can also use the Google dimensions and metrics explorer. Unlike with UA, the explorer requires authentication to show you the available dimensions and metrics; I believe the reason for that is because you can (and are likely expected) to create custom dimensions and metrics more often in GA4 than you did in UA.
In Universal Analytics, there was a limit of 9 dimensions and 10 metrics per request. I don’t see any documentation yet regarding limits like this in GA4.
In our example, we’re going to pull the source / medium as our dimension and sessions and page views as our metrics.
dimensions = ['sessionSourceMedium']
metrics = ['sessions', 'screenPageViews']
Building the Request Body
The bulk of the work when making API requests like this is making sure the request body is in tact and semantically accurate. The request body is where we’ll be setting our date ranges, dimensions, and metrics we’ll be pulling.
There are a handful of other items we can add to our request body as well. I’ll touch on a few, but they’re generally outside of the scope of this article. First there are two types of filters for our data, which I’ll likely be covering in a future article for advanced use of this API, dimensionFilter
and metricFilter
. You might be used to those in the UA API since they’re a great way to get just the data you want in a more efficient way. There are also limit
and offset
, which will help you pull more data at once as well as reach secondary pages of paginated data for large requests. As I’ve stated before, I like setting my limit to the highest possible of 100,000 rows since I’m often working on gigantic sites.
I recommend building the request body in the Google Analytics 4 batchRunReport method documentation page. This will guide you to make sure your syntax is correct so you’re building the request properly. You’ll primarily be referencing the RunReportRequest request body and, like all of their documentation, they offer a pop-over to build the body and try it first without having to put it in your Python script. It will prompt you with warnings for problems it detects and what the error could be. You can also run the report to see what data you get back or if you get an error. Just keep in mind that if you run a report from here, it counts toward your API daily limit.
Because we are running the batchRunReports, naturally our first key we are adding is requests
, which will be a list of all the reports you’re requesting. In our case, it’s a list of one item and that item is a dictionary.
Within that dictionary, I like to start with the date ranges. The dateRanges
key also has a list value because we could pull multiple dates at once. Again, in our case, to keep things simple, the list will only contain one dictionary where we have key:value pairs of startDate
and endDate
. Those are both strings and not a date or datetime objects and they use ISO 8601 date formatting; in simple terms, YYYY-MM-DD format.
The next key we are adding is our dimensions using the dimensions
key. We’ll be using a list comprehension to build this out, since a list of dictionaries is the format we need. We’ll be doing the exact same thing for metrics
.
Finally, like I said, I like to set the row limit to the max using the limit
key.
And that’s our request body. You’ll notice, again, that there is no viewId
or propertyId
field in the body. GA4 doesn’t use it in the body. We’ll get to that next.
request = {
"requests": [
{
"dateRanges": [
{
"startDate": "2022-03-01",
"endDate": "2022-03-31"
}
],
"dimensions": [{'name': name} for name in dimensions],
"metrics": [{'name': name} for name in metrics],
"limit": 100000
}
]
}
Making API Request
With the request body made, we are ready to make the request and saving the response to a variable. Notice that unlike the Universal Analytics API, we aren’t just tagging a request body in the method. There is an additional parameter of property
. This is where we are adding our property_id
we discussed earlier.
analytics = ga_auth(scopes)
response = analytics.properties().batchRunReports(property=property_id, body=request).execute()
Extracting Google Analytics Data From Response
I feel like I got a bit too in the weeds in the parsing section of the Universal Analytics API article, so I’ll spare you the greater details here. If you want to understand how to explore this data, take a peek at that section of the article. GA4 data will be a similar exploration experience but there will also be some differences. In fact, I think GA4 is actually a little easier to parse than the UA API data.
First, I’m a big fan of the list-class dictionary. Let’s initiate that and save it to a variable where we’ll be appending all of our data.
report_data = defaultdict(list)
Next we’ll write a basic set of for loops to extract all of our data. Since we’re running the batchRunReport, our first key we need to get is the reports
key. If you’re starting with the runReport method, you will start one level below this.
Then we get all of the rows of data to iterate over, we iterate over them to get the dimension(s) and metric(s) using the enumerate()
trick we learned in the UA API article tips and our report_data
dictionary is populated.
for report in response.get('reports', []):
rows = report.get('rows', [])
for row in rows:
for i, key in enumerate(dimensions):
report_data[key].append(row.get('dimensionValues', [])[i]['value']) # Get dimensions
for i, key in enumerate(metrics):
report_data[key].append(row.get('metricValues', [])[i]['value']) # Get metrics
Once we have it in a dictionary, we can easily convert it into a Pandas DataFrame where you can do whatever your heart desires.
df = pd.DataFrame(report_data)
Google Analytics 4 (GA4) batchRunReport Full Code Example
# Import Modules
import pandas as pd
from collections import defaultdict
# Authenticate & Build Service
analytics = ga_auth(scopes)
# Set Request Parameters
property_id = 'properties/306503726'
dimensions = ['sessionSourceMedium']
metrics = ['sessions', 'screenPageViews']
# Build Request Body
request = {
"requests": [
{
"dateRanges": [
{
"startDate": "2022-03-01",
"endDate": "2022-03-31"
}
],
"dimensions": [{'name': name} for name in dimensions],
"metrics": [{'name': name} for name in metrics],
"limit": 100000
}
]
}
# Make Request
response = analytics.properties().batchRunReports(property=property_id, body=request).execute()
# Parse Request
report_data = defaultdict(list)
for report in response.get('reports', []):
rows = report.get('rows', [])
for row in rows:
for i, key in enumerate(dimensions):
report_data[key].append(row.get('dimensionValues', [])[i]['value']) # Get dimensions
for i, key in enumerate(metrics):
report_data[key].append(row.get('metricValues', [])[i]['value']) # Get metrics
df = pd.DataFrame(report_data)
Eric is a Python SEO with a passion for data. He uses Python to automate tasks and analyze large data sets. He is often the go-to for spreadsheet challenges, too. He sees every challenge as an opportunity to learn.
friend i have problema:
AttributeError: ‘Resource’ object has no attribute ‘properties’ in response = analytics.properties().batchRunReports(property=property_id, body=request).execute()
Go back to the GA4 API Connection post and double check your parameters on the service build.
analyticsdata
andv1beta
are the parameters you want to make sure you have correct. I just verified my code still runs and pulls data, so I’m leaning toward your service building incorrectly. Feel free to share your code in a comment.Hi, thanks for this. It’s great!
How do you put on a filter to see just UK or USA countries for example.
Also for seeing just App, or both App and Website?
Thanks!
For any filter you either need to use the
dimensionFilter
ormetricFilter
keys with the appropriate values in the request body. I recommend building the request body out on the “Try this method” builder on the right of the documentation page.Hi Eric,
Thanks for posting your code. I have an authentication problem that I can’t get fixed. When I run your code, I get the following error message:
“User does not have sufficient permissions for this property. To learn more about Property ID, see https://developers.google.com/analytics/devguides/reporting/data/v1/property-id.”
I did some googling and I found that you have to add the service account as a user in the GA4 property. I used the service account that I see under APIs & Services ==> Credentials. It looks something like: ga4-b@.iam.gserviceaccount.com.
I added this email address in as a viewer (and I also tried as an editor) in the GA4 Property Access Management. When I run the code however, I get the same error message.
Have you run into something similar?
kind regards
Edgar
If you’re using OAuth which my code is using, you don’t need a service account. This is using your Google login – whichever one you’re logging/logged into needs to have at least view access to the GA4 property.
Hi Eric,
This is stupid, I somehow got mixed up in one of the many properties I use and was directing to the wrong property….
Thanks for your swift respons and your time,
KR
Edgar
Hi,
I have run the example and I do not get any result nor do I get errors. Will this print anything in the terminal with just the example you have given? df = pd.DataFrame(report_data) do I need to print() anything?
Thanks
Martin
Hey – silly me print(df) 🙂
Ha, glad you got it figured out. Sorry for the confusion. This was definitely intended to be more of the how to pull the data than what to do with it after. I assumed most people using this had some experience with Pandas and some of the ways to return an output. You can also use
df.to_csv()
ordf.to_excel()
to create a CSV or XLSX file. I also like to use the gspread_pandas module to directly write the DataFrame into Google Sheets.Hi,
How can we add segments in GA4 data API?
You should add a
dimensionFilter
ormetricFilter
key with the appropriate values to the request dictionary. You can see the fullbatchRunReports
documentation, also linked in the article.I want to access account list of ga4 using python script but i was enable to get it. Will you be able to provide scripts to achive all accounts list.
As far as I can tell, that is not possible using the Analytics API. You need to, instead, authorize the Analytics Admin API, set your build and scopes appropriately and then run
service.accounts().list().execute()
to return the list. I don’t have plans to write that article or script in the near future, but the process appears to be nearly identical to this guide to build the service. The only additional steps will be authorizing the different API in your Google Cloud Console, building a the right service usingservice = build('analyticsadmin', 'v1beta', credentials=creds)
then runningservice.accounts().list().execute()
. Good luck!Hello Eric, I have a webapp that uses GA3, user’s analytics data is retrieved after they log in into the app, I am trying to migrate to GA4 but I can’t find the way to get user’s property_id in order to make the request. Can you give me advice on how to find it? Thanks!
I believe you’d need to implement the AnalyticsAdmin API. Build the service the same way as I have here, making adjustments on the build so it goes
service = build('analyticsadmin', 'v1beta', credentials=creds)
. Then you can callservice.accounts().list().execute()
. Just know that this is a completely different API you need to authorize, even though it uses the same Analytics scope.Ok, thanks Eric, I will implement it and let you know the results.
Hello again, you were right. I implemented AnalyticsAdmin API and then I called service.accountSummaries().list().execute() which returns property id inside its response. Then I followed the steps you describe in this post. Thanks for your colaboration!
My pleasure! Glad to hear you got it working!
Hi Eric, I’m trying to follow your script above to test out my GA4 data, so can I just use the OAuth instead of credention json file? If yes, how can I authenticate using my OAuth?
Thank you!
I’m not sure I’m understanding your question. The OAuth flow requires the
credentials.json
file. That is how you authenticate OAuth. This does not use the API key methods. This is the full OAuth flow. I hope this helps!Hi Eric,
I hope you’re doing well. I have a request regarding obtaining user-level data. I’m experiencing discrepancies when trying to fetch data from the API compared to the user data displayed in the GA UI.
Could you please guide me on the steps to get accurate user-level data from the API? Your assistance in this matter would be greatly appreciated.
Thank you!
Where or how is it different? What report are you running and what are you looking at in the UI?
One important note is that in the UI, the Explore reports will have different data than what you get from the API but the reports that you build in the Reports section should be the same.
What if there are more than 100000 lines in the report? Since we are sending in a batch, each request can return multiple reports.
And if one of them has more rows than the limit; even using offset cause missing data on other reports. What do you think about that? What is the cleanest solution of that?
And also, are there any more efficient implementation for getting huge amount of data; or are we have to iterate through all the rows.
So, one of the keys that are returned in every report is a
rowCount
where the value is the total number of rows available in the entire report. In this article, I mention that the request body accepts a parameter oflimit
andoffset
. In this case, add offset to your request body and set it equal to 0. I’ve actually made a mistake stating that the row limit from the API is 100,000 rows; it’s actually 250,000. Nonetheless, if you have reports that are more rows than that, I’d recommend running awhile True
loop, then after the data is parsed into the dictionary from that response, check if the length of values in a given key’s value array is greater than or equal to therowCount
from the response. If it is not, you set the request body’s offset equal to itself plus the request body’s limit; if it is, set the offset back to 0 (I do this because I often loop over multiple properties) and thenbreak
from the loop. This should not cause any missing data beyond what GA4 would be putting into the (Other) row, which is a challenge of GA4 entirely rather than the API call. I really need to write another article on getting all the rows of data out of a report through a loop with the code written out.This method, obviously, does not utilize the actual batch part of the batch request, but it’s what works simplest for me. You could run a preliminary report with a limit of 1 to get the total rowCount then build a series of requests using that figure. That just seems overcomplicated to me.
Are there more efficient implementations than iterating over all the rows? Maybe. This is what works for me. If you have other ideas, I’d love if you’d share them.
Hi Eric,
Thank you for the great article, so in Universal Analytics, We were pulling history data for the all users who completed atleast one-goal using client id. we used analytics.reports().batchGet() for that.
But in GA4 I couldn’t find the report method that I can use to get user level details for specific users who completed at least one event (lead generation form).
I would think that
analytics.properties().batchRunReports()
would be able to achieve the same thing. That’s the GA4 version of the UA API call to get batched reports. I’m not sure how to run the report you’re asking for. If you have the request body of the UA report you were running, I might be able to help narrow it down for you. The request body is pretty close to the same except the metric and dimension names are different and the limit and offset are different. Most importantly, you might need to make sure your GA4 account is set up properly with any built audiences (ie: an user-audience of users who completed at least one goal), then you could create adimensionFilter
to reflect that. Please refer to the GA4 Dimension and Metric Explorer to see all the dimensions and metrics for the property you’re trying to run this for.And just to be clear, I don’t typically run user-scoped reports, so my solution is just a guess and might not be right. Hopefully it helps you get on the right track, though. Good luck!
Hi Eric,
do we have any relationship between each dimension , if so where do we get to see those .
we are trying to retrieve all the data relevant to any event such as event date, event name ,city ,country ,device etc which belongs to different dimensions.
I’m not sure, to be honest. When I have to go deep enough to get into custom event parameters, I go make SQL queries in my BigQuery export of GA4.
I think you need to set up custom dimensions based on the event parameters to pull that data out in the GA4 interface. If you do that, it should be a matter of putting those custom dimensions in the dimensions list. You can also add a dimension filter where the event name is the event name you’re looking for. Or maybe I’m misunderstanding your question.
Thanks for your response , we actually are trying to use data api to pull the data for GA4 rather than bigquery. so in order to pull the data we are specify the dimensions and metrics for an event but there’s a limitation of using only 9 dimensions in a single call so in order to connect the other dimensions of any event how do we connect is what we are looking for .
As far as I know, there’s not a way to get a joinable key from two sets (for example, something like a session_id) to pair the data together from the Data API. I think you’ll have to go to BigQuery.