GTrendsR package to Explore Google trending for Field Dependent Terms

My friend, Steve Simpson, introduced me to Philippe Massicotte and Dirk Eddelbuettel’s GTrendsR GitHub package this week. It’s a pretty nifty wrapper to the Google Trends API that enables one to search phrase trends over time. The trend indices that are given are explained in more detail here: https://support.google.com/trends/answer/4355164?hl=en

Ever have a toy you know is super cool but don’t know what to use it for yet? That’s GTrendsR for me. So I made up an activity to use it for, that’s related to my own interests (click HERE to download the just R code for this post). I decided to chose the first 10 phrases I could think of, related to my field, literacy. I then used GTrendsR to view how Google search trending has changed for these terms. Here are the 10 biased terms I choose:

  1. reading assessment
  2. common core
  3. reading standards
  4. phonics
  5. whole language
  6. lexile score
  7. balanced approach
  8. literacy research association
  9. international reading association
  10. multimodal

The last term did not receive enough hits to trend, which is telling, since the field is talking about multimodality, but search trends don’t seem to be affected to the point of registering with Google Trends.


Getting Started

The GTrendsR package provides great tools for grabbing the information from Google, however, for my own task I wanted simpler tools to grab certain chunks of information easily and format them in a tidy way. So I built a small wrapper package, mostly for myself, that will likely remain a GitHub only package: https://github.com/trinker/gtrend

You can install it for yourself (We’ll use it in this post), and load all necessary packages via:

devtools::install_github("dvanclev/GTrendsR")
devtools::install_github("trinker/gtrend")
library(gtrend); library(dplyr); library(ggplot2); library(scales)

The Initial Search

When you perform the search with gtrend_scraper, you will need to enter your Google user name and password.

I did an initial search and plotted the trends for the 9 terms. It was a big, colorful, clustery mess.

terms <- c("reading assessment", "common core", "reading standards",
    "phonics", "whole language", "lexile score", "balanced approach",
    "literacy research association", "international reading association"
)

out <- gtrend_scraper("your@gmail.com", "password", terms)

out %>%
    trend2long() %>%
    plot() 

plot of chunk trend_mess

So I faceted each of the terms out to look at the trends.

out %>%
    trend2long() %>%
    ggplot(aes(x=start, y=trend, color=term)) +
        geom_line() +
        facet_wrap(~term) +
        guides(color=FALSE)

plot of chunk trend_facet

Some interesting patterns began to emerge. I noticed a repeated pattern in almost all of the educational terms which I thought interesting. First we’ll explore that. The basic shape wasn’t yet discernible and so I took a small subset of one term, reading+assessment, to explore the trend line by year:

names(out)[1]
## [1] "reading+assessment"
dat <- out[[1]][["trend"]]
colnames(dat)[3] <- "trend"

dat2 <- dat[dat[["start"]] > as.Date("2011-01-01"), ]

rects <- dat2  %>%
    mutate(year=format(as.Date(start), "%y")) %>%
    group_by(year) %>%
    summarize(xstart = as.Date(min(start)), xend = as.Date(max(end)))

ggplot() +
    geom_rect(data = rects, aes(xmin = xstart, xmax = xend, ymin = -Inf, 
        ymax = Inf, fill = factor(year)), alpha = 0.4) +
    geom_line(data=dat2, aes(x=start, y=trend), size=.9) + 
    scale_x_date(labels = date_format("%m/%y"), 
        breaks = date_breaks("month"),
        expand = c(0,0), 
        limits = c(as.Date("2011-01-02"), as.Date("2014-12-31"))) +
    theme(axis.text.x = element_text(angle = -45, hjust = 0)) 

plot of chunk trend_iso

What I noticed was that for each year there was a general double hump pattern that looked something like this:

This pattern holds consistent across educational terms. I added some context to a smaller subset to help with the narrative:

dat3 <- dat[dat[["start"]] > as.Date("2010-12-21") & 
		dat[["start"]] < as.Date("2012-01-01"), ]

ggplot() +
    geom_line(data=dat3, aes(x=start, y=trend), size=1.2) + 
    scale_x_date(labels = date_format("%b %y"), 
        breaks = date_breaks("month"),
        expand = c(0,0)) +
    theme(axis.text.x = element_text(angle = -45, hjust = 0)) +
    theme_bw() + theme(panel.grid.major.y=element_blank(),
        panel.grid.minor.y=element_blank()) + 
    ggplot2::annotate("text", x = as.Date("2011-01-15"), y = 50, 
        label = "Winter\nBreak Ends") +
    ggplot2::annotate("text", x = as.Date("2011-05-08"), y = 70, 
        label = "Summer\nBreak\nAcademia") +
    ggplot2::annotate("text", x = as.Date("2011-06-15"), y = 76, 
        label = "Summer\nBreak\nTeachers") +
    ggplot2::annotate("text", x = as.Date("2011-08-18"), y = 63, 
        label = "Academia\nReturns") +
    ggplot2::annotate("text", x = as.Date("2011-08-17"), y = 78, 
        label = "Teachers\nReturn")+
    ggplot2::annotate("text", x = as.Date("2011-11-17"), y = 61, 
        label = "Thanksgiving")

plot of chunk narrative

Of course this is all me trying to line up dates with educational search terms in a logical sense; a hypothesis rather than an firm conclusion. If this visual model is correct though, that these events impact Google searches around educational terms, and if a Google search is an indication of work to advance understanding of a concept, it’s clear that folks aren’t too interested in doing much advancing of educational knowledge at Thanksgiving and Christmas time. These are of course big assumptions. But if true, the implications extend further. Perhaps the most fertile time to engage educators, education students, and educational researchers is the first month after summer break.


Second Noticing

I also noticed that the two major literacy organizations are in a negative downward trend.

out %>%
    trend2long() %>%
    filter(term %in% c("literacy+research+association", 
        "international+reading+association")) %>%
    as.trend2long() %>%
    plot() + 
    guides(color=FALSE) +
    ggplot2::annotate("text", x = as.Date("2011-08-17"), y = 60, 
        label = "International\nReading\nAsociation", color="#F8766D")+
    ggplot2::annotate("text", x = as.Date("2006-01-17"), y = 38, 
        label = "Literacy\nResearch\nAssociation", color="#00BFC4") +
    theme_bw() +
    stat_smooth()

plot of chunk downward_trend

I wonder what might be causing the downward trend? Also, I notice the trend is growing apart for the two associations, with the International Reading Association being effected less. Can this downward trend be reversed?


Associated Terms

Lastly, I want to look at some term uses across time and see if they correspond with what I know to be historical events around literacy in education.

out %>%
    trend2long() %>%
    filter(term %in% names(out)[1:7]) %>%
    as.trend2long() %>%
    plot() + scale_colour_brewer(palette="Set1") +
    facet_wrap(~term, ncol=2) +
        guides(color=FALSE)

plot of chunk terms

This made me want to group the following 4 terms together as there’s near perfect overlap in the trends. I don’t have a plausible historical explanation for this. Hopefully, a more knowledgeable other can fill in the blanks.

out %>%
    trend2long() %>%
    filter(term %in% names(out)[c(1, 3, 5, 7)]) %>%
    as.trend2long() %>%
    plot() 

plot of chunk overlap

I explored the three remaining terms in the graph below. As expected, ‘common core’ and ‘lexile’ (scores associated with quantitative measures of text complexity) are on an upward trend. Phonics on the other hand is on a downward trend.

out %>%
    trend2long() %>%
    filter(term %in% names(out)[c(2, 4, 6)]) %>%
    as.trend2long() %>%
    plot() 

plot of chunk overlap2

This was an fun exploratory use of the GTrends package. Thanks to Steve Simpson for the introduction to GTrends and Philippe Massicotte and Dirk Eddelbuettel for sharing their work.


*Created using the reports package

About tylerrinker

Data Scientist, open-source developer , #rstats enthusiast, #dataviz geek, and #nlp buff
This entry was posted in r and tagged , , , , , , , . Bookmark the permalink.

23 Responses to GTrendsR package to Explore Google trending for Field Dependent Terms

  1. Matthew says:

    Is it extremely easy to get an account rate-limited doing this?
    I tried a search for three terms and it worked like a charm, tried another one for six terms and only got ‘[1] “No Trends data for data+mining – substituting NA series…”‘ returns from there on.

    • tylerrinker says:

      @Matthew, there definitely is a daily limit but it’s a bit larger than what you describe. As far as I have experienced there was not an account rate-limit imposed for this activity. Is it possible the terms you searched for were not able to reach the threshold for views in order for the API to return results? You’d check this by rerunning the code I included that we know works.

      • Matthew says:

        I think they actually banned my IP for “suspicious activity” because I used a new google account to test it, I got a warning to change my passwords later on…. The ban does not seem to be lifted yet after ~16 hours, so beware.

  2. mattsigal says:

    Have you read about any methods for comparing these trend lines, other than visually? I would be very interested in hearing!

  3. Pingback: BibSonomy :: url :: GTrendsR package to Explore Google trending for Field Dependent Terms | TRinker's R Blog

  4. Arie says:

    Very nice. Is it possible to change the geography here. For example, focus only on the US.

  5. james says:

    Great article! I keep getting this error though when using gtrend_scraper, any ideas?

    Error in as.character(x) :
    cannot coerce type ‘closure’ to vector of type ‘character’

  6. Asper says:

    Very nice article. I have a little different task to do: comapare trends for different terms, like here http://www.google.com/trends/explore#q=roma%2C%20milano&cmpt=q&tz=
    Do you know how can i do that?

  7. Joe says:

    What if your gmail account requires a verification code in addition to a password. How would you enter that? Or should I instead create a new gmail account that doesn’t require a verification code?

  8. Robert says:

    Hi Tyler,

    In this post when you download all of the files, you pulled each one of these terms separately from google trends. This is different than doing them all together where you’re able to see the relative popularity of these terms to each other – which is what I think you wanted to do. Do you know if you’ve been able to find a package that does this? In other words, I type in the same keywords you did and get back one csv file with the relative popularity of each term?

    Here is an example of what graph I’d like to see when running multiple queries: https://www.google.com/trends/explore#q=Windows%20XP%2C%20Windows%20Vista%2C%20Windows%207%2C%20Windows%208%2C%20Windows%2010&cmpt=q&tz=Etc%2FGMT%2B7

    Also, if you haven’t found a reason why there is a negative trend in the “IRA” and “LRA” searches, let me offer a suggestion. In 2004, there was a much more academic bent to computer users (due to access of computers in universities, socioeconomic status, “techy-er” users, etc.) so with more of the populace getting access to the internet and computers, people in 2010 were searching for more (and different) things than in 2004.

  9. Pingback: Using R to query Google Trends and Ngrams | Eryk Walczak

  10. Farid says:

    Thanks for the post. Seems really helpful. One aspect I wasn’t able to find is how to get a 3m daily data or data from different time slots.

  11. Pingback: Import von Google-Trends-Zeitreihen nach R | Scripts & Statistics

  12. Mike G says:

    Does anyone know how to have the gtrends query be specific for the last 7 days, which will give you results by day?

  13. Satoshi Watanabe says:

    Hi,
    I executed the command “out<- gtrend_scraper("myaddress@gmail.com", "mypassword", "winter").
    Of course I got the trend data.
    But where is the regions data?
    I could not get them by "out$regions".
    Does anyone teach me how to do?

    Thank you.

Leave a comment