As long as you have created your Google API Credentials with the Google Analytics Data API (service name: analyticsdata.googleapis.com
) enabled, you are ready to begin programmatically pulling data from your GA4 properties in Google Analytics. In this article, we’re going to walk through authorizing your script to prepare for pulling reports.
Requirements
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
Before We Start Coding
Before we get into the script, let’s do a little housekeeping. At the end of the article where we create the Google credentials, I mentioned you should download the OAuth JSON file. In our script, we are going to be opening this file to authorize it. I recommend renaming it to credentials.json
, since that’s what we’re going to be calling in our script. I also recommend moving it to the same folder that you are writing this script in. Your folder structure might look something like this:
- ga_api
- credentials.json
- ga4_api.py
Authorizing Google Analytics 4 API in Python Script
Google has yet another way to connect to their API in the Google Analytics 4 quick start guide with Python. I personally don’t like the method and I’m concerned about the module imports after it leaves the beta. On the plus side, the OAuth flow I typically use from the Google Sheets API documentation still works but there are some changes in the service build at the very end compared to the Google Analytics (UA) authorization.
We’ll start by importing the modules we’ll be using.
# Import Modules
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
Define Authorization Function
If you’ve followed my guides before, this function will look very familiar. For the newcomers, I’m still going to walk through the function so you have a better understanding of what’s happening just under the hood.
def ga_auth(scopes):
creds = None
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file('token.json', scopes)
# If there are no (valid) credentials available, let the user log in.
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', scopes)
creds = flow.run_local_server(port=0)
# Save the credentials for the next run
with open('token.json', 'w') as token:
token.write(creds.to_json())
service = build('analyticsdata', 'v1beta', credentials=creds)
return service
First we set the variable creds
to None
. This is a common method in Python within functions, especially when you’ll be using a variable for conditionals.
Next we check to see if your folder has a token.json
file in it. If this is your first time authorizing, you don’t. If it’s not, we’re setting the creds
variable to the credentials we’ve previously gotten.
I’m going to jump around for a second because the next line we check if the credentials don’t exist or if they aren’t valid. If they exist and are valid, we completely skip this block.
If the credentials do exist but they are expired and have a refresh token, we are requesting a refresh.
If they don’t exist and don’t have a refresh token (which is the path our code will take the first time we run it), we open credentials.json
and ask for authorization. When you run this, what will happen at this point is the script will open a link in your browser and take you to the OAuth Consent Screen that you previously set up. Once you authorize that the app (script) can connect using your credentials.json
file and we set creds
to the authorization token we get back. For future use, we save the creds
into token.json
.
One important note about token.json
. You can use this same token for all the other Google API scripts we’ll be writing, if and only if the scopes don’t change. If the scopes are different, we will need to delete token.json
and re-run the function. This is very important, especially if you followed my Google Search Console Authorization guide because we only set our scopes to the Google Search Console scopes.
Finally, in our function, we build the service we’ll be using for our API calls. In this case, it will be analyticsdata
and we’re using v1beta
. I expect this will be coming out of beta soon and v1
will be operational; at the time of writing this, v1
does not work in the service build despite the Google Cloud Console showing v1
as a version filter.
Set Scopes and Obtain Authorization
Now that the function is written, we can put it to use.
The first thing we need to do is set our scopes. Since we are only covering Google Analytics in this article, we’re only going to use the GA scopes. The good news is the GA4 scopes are the same as the GA3/UA scopes!
I almost always run with at least two scopes in my list because I’m often using the same token.json
file in a single script to pull from two different sources (like Google Analytics and Google Search Console). Just remember that if you change your scopes, you’ll need to delete token.json
or you’ll get an error.
You can find the available scopes in each of the API’s documentations. Here are the Google Analytics 4 scopes.
scopes = ['https://www.googleapis.com/auth/analytics.readonly']
Simple. You’ll notice I’m using the .readonly scopes in this. I personally don’t like to try to manipulate anything in Google Analytics using the API; even less so when I’m working with a beta API on GA4. That might change some day if they come out with a way to manage event creation/modification via the API. For now, I use it solely for pulling down reports and prefer to use the UI for management. You can choose whatever you’d like.
With the scopes set, we are ready to call the function to build our service. Set the service to a variable because we’ll need that service object for future report requests (beats having to authorize every time).
service = ga_auth(scopes)
Now you can run the code!
Please visit this URL to authorize this application: https://accounts.google.com/o/oauth2/auth?THESEPARAMETERSAREGIVENTOYOU
Your browser should automatically open the link that the script creates. However, if it doesn’t, you can copy and paste the output URL. This will take you to the OAuth consent screen you set up. First, select the account you want to use (which should be the one you set the credentials up for), then you’ll see this screen.
Select Allow and you’re off to the races!
Now that you have authenticated your script, you are ready to pull your first report from the Google Analytics 4 API!
Eric is a Python SEO with a passion for data. He uses Python to automate tasks and analyze large data sets. He is often the go-to for spreadsheet challenges, too. He sees every challenge as an opportunity to learn.
Eric, your samples are awsome! I do have a question on GA4 auth…I like the way to request dims/metrics as you have listed in the “Using Python to pull GA4 reports”, but I am unclear to see how I can use a local creds.json file without being prompted to log on as I am using an ETL tool to operationalize the process. I can use the standard examples as Google has, but the dimension listing is less than desired. Any ideas (or something I am missing)? Also – if you need to paginate, is that possible using the script that you created? (Again, I have accomplished this via the Google documented way, but looking to streamline the process if possible.
In my experience, if you authenticate locally with
creds.json
and locally create atoken.json
, you should be able to copy those two files into wherever you’re hosting the script, like a web server. Alternately, if you have shell access for your server, you can run the script from the server shell and it should createtoken.json
in the correct directory.As far as how you want to list dimensions, I guess I’m not understanding the challenge you’re facing. There should be no issues putting the dimensions in a list then using a list comprehension to build out the list of dictionaries.
I’m unsure of what you mean by if I need to paginate. Do you mean if the results are paginated (ie: more than 100,000 rows)? If so, then yes, you can create a
while True
loop to pull paginated GA data, updating thenextPageToken
. I need to write up a demo for that. I thought I had it, but I apparently only have one for GSC.I cannot get the authentication code, can you guide me?
Are you getting an error code?
Hello Eric,
I’ve got the error “You can’t sign in because this app sent an invalid request. You can try again later, or contact the developer about this issue. Learn more about this error
If you are a developer of this app, see error details.
Error 400: redirect_uri_mismatch”
Although I added the redirect link to Client IDs but it was not worked.
If you’re following the steps I’ve laid out in the posts, you don’t need to specify the redirect_uri. If you downloaded your OAuth credentials and saved them as
credentials.json
, that’s it. There is a key:value pair within that JSON that specifies the redirect_uris. Double check to make sure you’re not using a service account. If you’re failing to authorize, you’re not at the point where you even need to worry about Client IDs.Is there a reason, why am redirected to localhost after granting permissions to Google Account? When allowing I am getting crash at localhost 🙁
Your
credentials.json
should have aredirect_uris
key with ahttp://localhost
value. This is a typical OAuth flow. Make sure you’re using updated credentials since Google has made changes and deprecated out-of-band flows. If you are using updated credentials, make sure you’re not trying to use a service account. If you aren’t, let me ask what application you’re running this in? I haven’t tested this in Jupyter or other web-based interfaces. It’s possible that could be part of the issue?Just wanted to drop a quick note to say a huge thank you for your awesome post! It’s been a game-changer for me. There’s hardly any documentation out there about connecting to GA4 API with OAuth credentials instead of service accounts, so your insights have been incredibly valuable.
I really appreciate the effort you put into sharing this information. You’ve made my life a lot easier, and I can’t thank you enough.
You’re welcome! Thank you for stopping by. It’s wild to me that, even still, the only guides I see out there for the GA4 API require a major refactoring from the UA API.
If you’d like additional ways to show your appreciation, I’d really appreciate shout outs to my site or guides on social media, especially Twitter and LinkedIn or you could throw a few dollars my way on Buy Me a Coffee to help fund the site (I intentionally don’t run ads because I find that business model intrusive).
Thanks for this blog post Eric. I am trying to run this from Azure Databricks and I’m able to get to the last screenshot in your post and I’m able to grant access to my google account. It then takes me to a page with the following error message.
Hmmm… can’t reach this pagelocalhost refused to connect.
Try:
Search the web for localhost
Checking the connection
Checking the proxy and the firewall
ERR_CONNECTION_REFUSED
I’m sure this is because your python code is running locally vs the Databricks code running in a cloud spark cluster. Would the redirect URL need to point back to Databricks? I’ve done a fair amount of searching online and I’m coming up empty. Thanks again for this blog, it was quite helpful to get to where I’m at currently
I believe you’re correct, you’re getting the error because you’re running it entirely server-side and I am running it locally. To be honest, I don’t do much on the server side. I recommend reading up on OAuth 2.0 redirect URIs to get a better idea of where you can or should point this. The redirect URI is a key in
credentials.json
that you could edit. Alternatively, what I have done for the little server-side scripts I’ve implemented is run the script locally, then ensure I uploadcredentials.json
andtoken.json
to the server where the script would be running. From then on, in my experience, thetoken.json
refreshes automatically as expected without issue.Hi Eric thanks for sharing this information. It‘s really helpful! The script works fine. The OAuth consent screen is showing up and I can select the account. But after this, the process run into an error due to our firewall. I have to use our proxyserver. Do you know, how to change the script in a way, that a proxy server is used? Thanks in advance
Are you trying to run the script on a server or on a local machine? I haven’t needed to solve for a firewall so I’m not familiar with using proxy servers with my Python scripts, sorry.
Hi
Thank you for your post
I am trying to access GA4 data with python and Azure function. If I do that on the local computer everything works fins and I can download the data in my computer.
But as soon as I upload the code to cloud Azure function will give a 500 error.
The error happens on line client = BetaAnalyticsDataClient()
I have gone through any documentation on the web and google and I don’t know how to move on
the credential is saved on a secret key on azure
the same structure is working for UA and pulls the data but for that we don’t use BetaAnalyticsDataClient()
Hi Yasaman. Getting a 500 error is odd because that’s a server error, not any sort of HTTP denial. That said, I can also say that you aren’t using the code I wrote in this article because I don’t use the
BetaAnalyticsDataClient()
class that is described in the official GA4 documentation. Mine is written based on other Google OAuth flows and is, frankly, better. I’d suggest reading through and following the guide I’ve written and then let me know if you have issues.