Google Search Console API with Python

Using Python to Make Google Search Console API Requests

Now that you have an authentication to use the Google Search Console API, we can dig into some of the report requests you can run. We’ll be covering the four supported reports from Google Search Console documentation and how to handle the data you get back.

Requirements

A Google Search Console API connection. I’ll be calling the function we wrote in the linked post to start each report.

pip install pandas

GSC Report 1: Sites

The first report type we are pulling is incredibly basic; the sites report. This report will return a list of all of the sites in your Google Search Console account whether or not you have access to them. The code for this is quite simple and requires no additional parameters for the request.

service = gsc_auth(scopes)
gsc_sites = service.sites().list().execute

This will return a dictionary with a single key where the value is a list of dictionaries that will include the siteURL and permissionLevel for each one. This type of response is fairly typical when dealing with APIs and is known as a JSON response. JSON and Python dictionaries get along well together.

To pull just the sites we have access to, we’ll iterate over this list of dictionaries, checking for the correct permissions.

verified_sites = [site['siteUrl'] for site in gsc_sites['siteEntry']
                  if site['permissionLevel'] != 'siteUnverifiedUser']

Now verified_sites will be a list of all the URLs in your account that you have access to. Though it’s important to note that not all of these will be URLs exactly. If you have any domain properties in the account that you are a site owner or verified user, those will also be in this list and be prefixed with sc-domain:. In my personal opinion, if your sites are comprised only of https-www, https-non-www, http-www, and http-non-www, you’ll probably want to use the sc-domain: version for all subsequent requests.

>>> print('\n'.join(verified_sites))

"https://www.shortautomaton.com/"
"http://www.shortautomaton.com/"
"https://shortautomaton.com/"
"http://shortautomaton.com/"
"sc-domain:shortautomaton.com"

And that’s it. You’ve executed your first Google Search Console API call in Python!

You can do more with the sites() function than list the sites you have access to. You can get information about a specific site with .get(siteUrl={site}) and you can add or delete sites from your accounts with .add(siteUrl={site}) and .delete(siteUrl={site}) respectively. I won’t go over those here, but you can find the documentation to each of those in the GSC API Docs.

GSC Report 2: Sitemaps

The next report we’ll be pulling is the sitemaps report. Again, we’re going to be pulling a list of the sitemaps with this, though there are other ways to interact with the API for sitemaps.

service = gsc_auth(scopes)
sals_sitemaps = service.sitemaps().list(siteUrl='sc-domain:shortautomaton.com').execute()

Just like the sites, it will return a dictionary with a single key of sitemap where the value is a list of sitemaps in the account and a variety of details within them including the last time it was submitted, whether it is pending, whether it’s a sitemap index, the last time it was downloaded, any warnings or errors associated with the sitemap, and details regarding the content.

I only have one sitemap added to my domain property, but you’d see these details for all of the sitemaps you have.

>>> print(sals_sitemaps)

{'sitemap': [{'path': 'https://www.shortautomaton.com/sitemap.xml',
   'lastSubmitted': '2022-03-13T03:46:52.485Z',
   'isPending': False,
   'isSitemapsIndex': True,
   'lastDownloaded': '2022-03-14T10:37:55.545Z',
   'warnings': '0',
   'errors': '0',
   'contents': [{'type': 'web', 'submitted': '10', 'indexed': '0'},
    {'type': 'image', 'submitted': '16', 'indexed': '0'}]}]}

Pulling submitted sitemaps programmatically is a great way to identify problematic sitemaps at scale and fix them. I’ve worked on a site with hundreds of submitted sitemaps and many of them had errors. When you identify the errors, you can work to submit corrected sitemaps and/or remove problematic sitemaps using this same API. Other methods you might call with this instead of .list() are .get(siteUrl={site}, feedpath={sitemap.xml}), .submit(siteUrl={site}, feedpath={sitemap.xml}), and .delete(siteUrl={site}, feedpath={sitemap.xml}). More details regarding each of these methods can be found in the GSC API Docs.

GSC Report 3: Search Analytics

The third report, which is likely where you’ll be spending the majority of your time with the Google Search Console API, is the Search Analytics report. This is where you’ll be able to pull metrics just like you see in the Performance dashboard in GSC, except you’ll be able to pull everything much quicker, with more detail, and not be limited to 1000 rows like in the UI.

This request is different from the others we’ve covered so far because it doesn’t just have a simple input of a siteUrl. There is a request body that goes into this where you will be telling the API everything you want from date ranges to dimensions to types of reports (discover, web, image, etc.) and also any filtering options. I find it helpful to build this request separately and assign it to a variable as opposed to trying to cram it all in the function being called.

I also find it incredibly helpful to build this request in the search analytics method documentation page first so you can ensure you’re building the request correctly. They have a request body section showing you all of the possibilities and they also have a pop-over to try the method, allowing input guides. It will give you a warning if you’re request has errors and try to tell you what the errors might be. You can also run it straight from the pop-over to make sure you’re getting data in the way you want. Note that this will count toward your daily limit.

Google Search Console Search Analytics Guide

We’re going to build a pretty basic request for the purposes of this guide. Take note from the image above that nearly all of the values are strings with the exception of rowLimit and startRow, which are integers. This is pretty obvious for most of the keys, but I wanted to point it out because it’s not so obvious for the startDate and endDate. Those are also a string and not a date or datetime object and they use ISO 8601 date formatting; in simple terms, YYYY-MM-DD format.

Alright, let’s build our request. We’re going to pull from the first of the month to the 15th of the month. We want to only look at the queries. And we’ll set the rowLimit to the max 25000. The nice part about building the request in their webpage is that you can copy and paste it without worrying about making sure you format it correctly.

service = gsc_auth(scopes)

request = {
  "startDate": "2022-03-01",
  "endDate": "2022-03-15",
  "dimensions": [
    "QUERY"
  ],
  "rowLimit": 25000
}

Once we have the request body assigned to a variable, we can make our request.

gsc_search_analytics = service.searchanalytics().query(siteUrl='sc-domain:shortautomaton.com', body=request).execute()

Just like the others this will return a dictionary. This dictionary will have two keys to it: rows, which is what we’ll primarily be focusing on, and responseAggregationType, which tells you how the data was aggregated.

>>> print(gsc_search_analytics)

{'responseAggregationType': 'byProperty',
 'rows': [{'clicks': 220,
           'ctr': 0.3283582089552239,
           'impressions': 670,
           'keys': ['best search console api guide'],
           'position': 1.2537313432835822},
          {'clicks': 70,
           'ctr': 0.05982905982905983,
           'impressions': 1170,
           'keys': ['python seo website'],
           'position': 6.666666666666667},
          {'clicks': 60,
           'ctr': 0.0759493670886076,
           'impressions': 790,
           'keys': ['beginner python projects'],
           'position': 1.6708860759493671}...

If you were to print like this, you’ll likely see this all in one line. I’m showing a pretty-printed version so you can better understand how this data appears.

My favorite part of the way this data is returned is that you don’t have to do anything fancy to turn it into a Pandas DataFrame. If you call the rows key, Pandas will do its thing very efficiently.

gsc_sa_df = pd.DataFrame(gsc_search_analytics['rows'])
>>> gsc_sa_df.head(3)

                              keys  clicks  impressions       ctr  position
0  [best search console api guide]     220          670  0.328358  1.253731
1             [python seo website]      70         1170  0.059829  3.666667
2       [beginner python projects]      60          790  0.075949  1.670886

Again, this is designed to be a pretty basic tutorial. What we went through will be no different than if you went to Google Search Console and looked at the Query tab of the Performance Report. You can do a lot more with this like setting your request body to pull QUERY and PAGE dimensions, which you can’t easily get from the GSC UI. The response will look the same and you can do all the same stuff, but your keys column will be a list with two items in it instead of one. I recommend splitting that column into individual columns representing the dimensions you’d like. You might end up with duplicate queries if multiple pages rank for the same query and you will likely end up with multiple pages if some pages rank for multiple queries.

Now that you know how to get started, the best way to learn this is to play around with the API to see what data you can pull!

The more advanced users can learn how to create a loop of requests to get all rows from the Search Analytics API.

GSC Search Analytics Full Code

service = gsc_auth(scopes)

request = {
  "startDate": "2022-03-01",
  "endDate": "2022-03-15",
  "dimensions": [
    "QUERY"
  ],
  "rowLimit": 25000
}

gsc_search_analytics = service.searchanalytics().query(siteUrl='sc-domain:shortautomaton.com', body=request).execute()

gsc_sa_df = pd.DataFrame(gsc_search_analytics['rows'])

GSC Report 4: URL Inspection

The final report we’ll cover is relatively new at the time of writing this. It’s the URL Inspection tool. It just came to the API on January 31, 2022. The URL Inspection API call acts exactly like the Inspect URL within Google Search Console except you can do it programmatically. Again, you can build the body using the Try this method pop-over in the URL Inspection docs.

service = gsc_auth(scopes)

request = {
  "siteUrl": "sc-domain:shortautomaton.com",
  "inspectionUrl": "https://www.shortautomaton.com/"
}

gsc_inspect = service.urlInspection().index().inspect(body=request).execute()

The request body can take 3 parameters. I’m only showing two: siteUrl and inspectionUrl, which are your GSC property and the URL you want to inspect, respectively. There is a third, languageCode, which is optional to receive the response in another language. The default for this is en-US, therefore if English is your language of choice, you don’t need it.

This one takes a few seconds to run in my experience and you’ll be returned a dictionary with inspectionResult as the only key. Let’s dig into what this result shows.

>>> print(gsc_inspect['inspectionResult']

{'indexStatusResult': {'coverageState': 'Submitted and indexed',
                       'crawledAs': 'MOBILE',
                       'googleCanonical': 'https://www.shortautomaton.com/',
                       'indexingState': 'INDEXING_ALLOWED',
                       'lastCrawlTime': '2022-03-09T09:47:12Z',
                       'pageFetchState': 'SUCCESSFUL',
                       'robotsTxtState': 'ALLOWED',
                       'sitemap': ['https://www.shortautomaton.com/sitemap.xml'],
                       'userCanonical': 'https://www.shortautomaton.com/',
                       'verdict': 'PASS'},
 'inspectionResultLink': 'https://search.google.com/search-console/inspect?resource_id=sc-domain:shortautomaton.com&id=ttpz8_1ZcDPS0CNwCKVooA&utm_medium=link&utm_source=api',
 'mobileUsabilityResult': {'verdict': 'PASS'},
 'richResultsResult': {'detectedItems': [{'items': [{'name': 'Unnamed item'}],
                                          'richResultType': 'Sitelinks '
                                                            'searchbox'}],
                       'verdict': 'PASS'}}

Just like before, I pretty-printed this for the ease of understanding. There are four keys within inspectionResult: indexStatusResult, inspectionResultsLink, mobileUsabilityResult, and richResultsResult. If you use AMP, there is a fifth that might appear called ampResult.

The inspectionResultsLink is pretty self-explanatory, and even more so when you visit the URL. It brings you to the Google Search Console page of the inspected URL that you called in your code.

The mobileUsabilityResult basically tells you whether you’re mobile-friendly or not. Hooray! I passed!

Frankly, the two more important reports you’ll get from this are richResultsResult, which will tell you about any valid (or invalid) structured data markup on the page, and indexStatusResult, which tells you everything on the base level about a page.

Diving into the indexStatusResult, it tells us whether it is indexed, whether it is allowed to be indexed or crawled, the user-defined and Google-recognized canonical, whether it was crawled as mobile or desktop, the last time it was crawled, and any linking sitemaps. As I stated before, this is a new ability with the API, and I haven’t dug into it too much, but this is what I would gravitate more toward when trying to execute inspections as scale. It’s important to note that Google limits the daily inspect URL API calls to 2000 per day per property. This means if you have a www. subdomain and an es. subdomain, each one can have URLs inspected up to 2000 times via the API each day. If I’m not mistaken, this also means that an https-www property and a domain property each get 2000 requests per day.

I’m a big fan of Pandas DataFrames. It’s probably because I did so much work in spreadsheets in the past and I find them incredibly easy to filter, sort, and search for what I’m looking for. Just to show how easy it is to convert into a DataFrame, we’re going to put the indexStatusResult into a one.

inspect_df = pd.DataFrame(gsc_inspect['inspectionResult']['indexStatusResult'])
>>> inspect_df.head()

  verdict  ... crawledAs
0    PASS  ...    MOBILE

It produces a 10-column DataFrame with 1 row. On the surface, it seems useless to put into a DataFrame. Arguably inspecting a single URL programmatically seems pretty useless to me. The value in this API call and putting it into a DataFrame is in iterating over a list of URLs. Maybe you have a spreadsheet of blog articles you post in a kind of content calendar, for example. You could pull that spreadsheet and iterate over each published URL to see if or when they get indexed. Or if you have a list of pages you are updating, you can quickly see whether Google has recrawled the pages since being updated.

With each URL you inspect, you could append the indexStatusResult to the end of the DataFrame creating a full-blown inspection report. And now you’re inspecting your site at scale!

GSC URL Inspection Full Code

service = gsc_auth(scopes)

request = {
  "siteUrl": "sc-domain:shortautomaton.com",
  "inspectionUrl": "https://www.shortautomaton.com/"
}

gsc_inspect = service.urlInspection().index().inspect(body=request).execute()

inspect_df = pd.DataFrame(gsc_inspect['inspectionResult']['indexStatusResult'])

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top