Question Query Regex from Google Search Console

Using Custom Regex in Google Search Console to Get All Question Queries

Discovering content opportunities for your website is sometimes as easy as looking at your own search query data. I like to look at the queries where we are potentially answering the question a user is searching Google for. Hopefully you are finding that your content is sufficiently answering the questions being asked. I’ve found that, while that is true most of the time, I’m often missing content that answers the questions. Maybe we rank well for a question we intended to answer, but those pages might also rank for questions we aren’t fully answering. This is where the content opportunity presents itself to either revise your page content to answer those questions or create new content to answer the question.

Google Search Console has made it easy for us to filter search queries for our site so we are just looking at search terms that have, what I call, question words in them. I’m going to walk you through some of my favorite question words to filter my search queries, how to write a regex to get them all at once, how to apply the filter in the web UI, and, naturally, how to pull them via the Google Search Console API.

Identify Question Words

First we need to identify the question words we’re interested in pulling. We can start with the basics:

  • who – “who are the best python seos?”
  • what – “what is the best site to learn advanced use cases in google search console?”
  • when – “when was the last google core algorithm update?”
  • where – “where can i learn regex?”
  • why – “why use the google search console api?”
  • how – “how to filter google search console queries?”

There are probably other questions you are answering and we want to identify those queries as well.

  • am – “am i keyword stuffing?”
  • can – “can google use indexnow?”
  • could – “could i get a manual action for spammy links?”
  • would – “would you use ahrefs or semrush for keyword data?”
  • should – “should you provide examples for all of these?”
  • do – “do you like my examples?”
  • did – “did i cover everything?”
  • is – “is there an end in sight?”
  • was – “was this list inclusive enough?”
  • are – “are you getting sick of these?”
  • were – “were you expecting examples of these terms?”
  • will – “will this list ever end?”
  • whom – “to whom may i complain about these?”
  • whose – “whose bright idea was this anyway?”
  • which – “which question words did i miss?”

There may be others that are applicable to your business, but I’ve had my fun. Feel free to add any question words that are relevant to your website.

Write Regex For Question Words

Regex, a portmanteau for regular expression, can be tricky to figure out at first. It’s an incredibly powerful tool for finding and filtering data. Mastery of regex comes with years of using it. Although I know it well, I’m not a regex master, so I use Regex101 to double check my work.

Before I get into writing the regex for our question words, I want to point out a few important notes. Python has a standard regex library, re. It is slightly different than what Google uses and we don’t need to use it here. Google uses RE2 syntax, which lacks some features; we aren’t going to get into those at all in this article. If you’re using Regex101, for best results regarding RE2, select Golang on the left side.

Regex Elements To Use

As I do on this site, I prefer to over-explain the reasoning behind what we’re doing. Rather than just put our words into a regular expression and hand them over, I want to talk about the different elements we’ll be using (and a couple of others you might want to use as well).

| in regex means or. We’ll be putting a pipe | between each of our question words as a way to say we want to match queries that contain word-1 or word-2 or word-3, etc.

\b in regex is a word boundary. We are using this at the beginning and end of each word so we don’t accidentally match words that contain our question words within other words. For example, dentist will match on is if we don’t set the word boundaries.

() in regex are capturing groups. We are using these to simplify our expression. Like I said, since we are using \b at the beginning and end of each word, we can actually just use \b at the beginning and end of our entire statement and add all of our words within a capturing group. You could use it like: the (girl|boy)'s name is Alex. As you can see, it keeps you from having to write two full sentences to match both options.

^ in regex is the beginning of the search term. We aren’t going to be using it in our example, but you can if you want to only look for queries that start with a question word. As an example, ^what will match on the query what does the caret symbol mean in regex, but won’t match on the query the caret symbol means what in regex.

$ in regex is the end of the search term. Again, we aren’t going to be using this in our example. I doubt you’ll be using this in your implementation either unless you get a lot of question searches by Yoda. As an example here, can$ will match on the query use regex, you can, but won’t match on the query can you use regex.

Question Words Regex in RE2 Syntax

No need to mince words here, I’ll cut to the chase. We have our word list from above and the elements we’ll be using. Here’s our resulting regex.

\b(who|what|when|where|why|how|am|can|could|would|should|do|did|is|was|are|were|will|whom|whose|which)\b

Filtering in Google Search Console

To filter your query results in Google Search Console, first we need to go to the performance tab. By default, there are two filters applied: Search type: Web and Date: Last 3 months. Right next to that is a + New button. Click on that and select Query…

Google Search Console Performance Filter

It will pop open a module where we can set a filter or comparison. The default option is Queries containing. Open that drop down and we will be selecting Custom (regex). This will produce another option that we’re going to leave as Matches regex. Then we just paste the regex we created earlier and apply the filter.

Google Search Console Query Filter Regex

And now we see all of the queries the have at least one of our question words in them.

Pulling Question Queries via Google Search Console API with Python

We can get a filtered query report using regex from the Google Search Console API as well, which makes us happy because we can pull a lot more data at scale this way. If you’re not familiar with how to do that, we have a step-by-step guide to the Google Search Console API with Python that will take you from creating credentials to running your first reports.

In this article, we’re going to be reusing a lot of code we wrote in the Connecting to API article and Search Analytics report section. In fact, the only thing we’re really going to be changing is the request body.

# Import Modules
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials

# Define function to get authorization
def gsc_auth(scopes):
    creds = None
    # The file token.json stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.json'):
        creds = Credentials.from_authorized_user_file('token.json', scopes)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', scopes)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.json', 'w') as token:
            token.write(creds.to_json())

    service = build('searchconsole', 'v1', credentials=creds)

    return service

scopes = ['https://www.googleapis.com/auth/webmasters']

service = gsc_auth(scopes)

For ease of use, I like to save the regular expression string to a variable. It will make the request body a little cleaner.

question_regex = r"\b(who|what|when|where|why|how|am|can|could|would|should|do|did|is|was|are|were|will|whom|whose|which)\b"

Notice the r just before the opening quote. This is an indicator that we are using a raw string. In a regular string in Python, the backslash is a way to escape characters or do special things. For example, \n stands for a new line. In this case, we need \b to be a part of the string, so we need to either make it a raw string like we did or we could escape the backslash in a regular string like \\b. That gets messy fast, so be careful.

Next, we’ll be building the request body. This will look very familiar if you followed my Search Analytics guide.

sa_request = {
    "startDate": "2022-01-01",
    "endDate": "2022-03-15",
    "dimensions": [
        "QUERY",
        "PAGE"
    ],
    "dimensionFilterGroups": [
        {
            "filters": [
                {
                    "dimension": "QUERY",
                    "operator": "INCLUDING_REGEX",
                    "expression": question_regex
                }
            ]
        }
    ],
    "rowLimit": 25000
}

As you can see, we have a new key in here I haven’t covered before, dimensionFilterGroups and the key accepts a list value where we can add multiple dictionaries to it. For this article, we’re just adding one dictionary with a filters key.

For each filter, there are three keys required: dimension, operator, and expression. This is exactly the same as the web browser input. To filter queries using regex with the regex we built earlier in this article, we are setting those keys to the values "QUERY", "INCLUDING_REGEX", and question_regex respectively. The former two are strings and the latter is our question_regex variable we assigned, though you could bypass the assignment and put your regex string directly into the dictionary.

Then we just run the search analytics API call.

gsc_search_analytics = service.searchanalytics().query(siteUrl='sc-domain:shortautomaton.com',
                                                       body=sa_request).execute()

We now have a list of all of our queries (and the pages that rank for each query) in a dictionary response from the API. We can parse it as we see fit!

Full Code

# Import Modules
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials


# Define function to get authorization
def gsc_auth(scopes):
    creds = None
    # The file token.json stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.json'):
        creds = Credentials.from_authorized_user_file('token.json', scopes)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', scopes)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.json', 'w') as token:
            token.write(creds.to_json())

    service = build('searchconsole', 'v1', credentials=creds)

    return service


# Authenticate API
scopes = ['https://www.googleapis.com/auth/webmasters']
service = gsc_auth(scopes)

# Build Question Query Regex
question_regex = r"\b(who|what|when|where|why|how|am|can|could|would|should|do|did|is|was|are|were|will|whom|whose|which)\b"

# Build Request Body
sa_request = {
    "startDate": "2022-01-01",
    "endDate": "2022-03-15",
    "dimensions": [
        "QUERY",
        "PAGE"
    ],
    "dimensionFilterGroups": [
        {
            "filters": [
                {
                    "dimension": "QUERY",
                    "operator": "INCLUDING_REGEX",
                    "expression": question_regex
                }
            ]
        }
    ],
    "rowLimit": 25000
}

# Make Request to GSC
gsc_search_analytics = service.searchanalytics().query(siteUrl='sc-domain:shortautomaton.com',
                                                       body=sa_request).execute()

Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top