GTrendsR package to Explore Google trending for Field Dependent Terms

Posted on November 24, 2014 by tylerrinker

My friend, Steve Simpson, introduced me to Philippe Massicotte and Dirk Eddelbuettel’s GTrendsR GitHub package this week. It’s a pretty nifty wrapper to the Google Trends API that enables one to search phrase trends over time. The trend indices that are given are explained in more detail here: https://support.google.com/trends/answer/4355164?hl=en

Ever have a toy you know is super cool but don’t know what to use it for yet? That’s GTrendsR for me. So I made up an activity to use it for, that’s related to my own interests (click HERE to download the just R code for this post). I decided to chose the first 10 phrases I could think of, related to my field, literacy. I then used GTrendsR to view how Google search trending has changed for these terms. Here are the 10 biased terms I choose:

reading assessment
common core
reading standards
phonics
whole language
lexile score
balanced approach
literacy research association
international reading association
multimodal

The last term did not receive enough hits to trend, which is telling, since the field is talking about multimodality, but search trends don’t seem to be affected to the point of registering with Google Trends.

Getting Started

The GTrendsR package provides great tools for grabbing the information from Google, however, for my own task I wanted simpler tools to grab certain chunks of information easily and format them in a tidy way. So I built a small wrapper package, mostly for myself, that will likely remain a GitHub only package: https://github.com/trinker/gtrend

You can install it for yourself (We’ll use it in this post), and load all necessary packages via:

devtools::install_github("dvanclev/GTrendsR")
devtools::install_github("trinker/gtrend")
library(gtrend); library(dplyr); library(ggplot2); library(scales)

The Initial Search

When you perform the search with gtrend_scraper, you will need to enter your Google user name and password.

I did an initial search and plotted the trends for the 9 terms. It was a big, colorful, clustery mess.

terms <- c("reading assessment", "common core", "reading standards",
    "phonics", "whole language", "lexile score", "balanced approach",
    "literacy research association", "international reading association"
)

out <- gtrend_scraper("your@gmail.com", "password", terms)

out %>%
    trend2long() %>%
    plot()

plot of chunk trend_mess

So I faceted each of the terms out to look at the trends.

out %>%
    trend2long() %>%
    ggplot(aes(x=start, y=trend, color=term)) +
        geom_line() +
        facet_wrap(~term) +
        guides(color=FALSE)

plot of chunk trend_facet

Some interesting patterns began to emerge. I noticed a repeated pattern in almost all of the educational terms which I thought interesting. First we’ll explore that. The basic shape wasn’t yet discernible and so I took a small subset of one term, reading+assessment, to explore the trend line by year:

names(out)[1]

## [1] "reading+assessment"

dat <- out[[1]][["trend"]]
colnames(dat)[3] <- "trend"

dat2 <- dat[dat[["start"]] > as.Date("2011-01-01"), ]

rects <- dat2  %>%
    mutate(year=format(as.Date(start), "%y")) %>%
    group_by(year) %>%
    summarize(xstart = as.Date(min(start)), xend = as.Date(max(end)))

ggplot() +
    geom_rect(data = rects, aes(xmin = xstart, xmax = xend, ymin = -Inf, 
        ymax = Inf, fill = factor(year)), alpha = 0.4) +
    geom_line(data=dat2, aes(x=start, y=trend), size=.9) + 
    scale_x_date(labels = date_format("%m/%y"), 
        breaks = date_breaks("month"),
        expand = c(0,0), 
        limits = c(as.Date("2011-01-02"), as.Date("2014-12-31"))) +
    theme(axis.text.x = element_text(angle = -45, hjust = 0))

plot of chunk trend_iso

What I noticed was that for each year there was a general double hump pattern that looked something like this:

This pattern holds consistent across educational terms. I added some context to a smaller subset to help with the narrative:

dat3 <- dat[dat[["start"]] > as.Date("2010-12-21") & 
		dat[["start"]] < as.Date("2012-01-01"), ]

ggplot() +
    geom_line(data=dat3, aes(x=start, y=trend), size=1.2) + 
    scale_x_date(labels = date_format("%b %y"), 
        breaks = date_breaks("month"),
        expand = c(0,0)) +
    theme(axis.text.x = element_text(angle = -45, hjust = 0)) +
    theme_bw() + theme(panel.grid.major.y=element_blank(),
        panel.grid.minor.y=element_blank()) + 
    ggplot2::annotate("text", x = as.Date("2011-01-15"), y = 50, 
        label = "Winter\nBreak Ends") +
    ggplot2::annotate("text", x = as.Date("2011-05-08"), y = 70, 
        label = "Summer\nBreak\nAcademia") +
    ggplot2::annotate("text", x = as.Date("2011-06-15"), y = 76, 
        label = "Summer\nBreak\nTeachers") +
    ggplot2::annotate("text", x = as.Date("2011-08-18"), y = 63, 
        label = "Academia\nReturns") +
    ggplot2::annotate("text", x = as.Date("2011-08-17"), y = 78, 
        label = "Teachers\nReturn")+
    ggplot2::annotate("text", x = as.Date("2011-11-17"), y = 61, 
        label = "Thanksgiving")

plot of chunk narrative

Of course this is all me trying to line up dates with educational search terms in a logical sense; a hypothesis rather than an firm conclusion. If this visual model is correct though, that these events impact Google searches around educational terms, and if a Google search is an indication of work to advance understanding of a concept, it’s clear that folks aren’t too interested in doing much advancing of educational knowledge at Thanksgiving and Christmas time. These are of course big assumptions. But if true, the implications extend further. Perhaps the most fertile time to engage educators, education students, and educational researchers is the first month after summer break.

Second Noticing

I also noticed that the two major literacy organizations are in a negative downward trend.

out %>%
    trend2long() %>%
    filter(term %in% c("literacy+research+association", 
        "international+reading+association")) %>%
    as.trend2long() %>%
    plot() + 
    guides(color=FALSE) +
    ggplot2::annotate("text", x = as.Date("2011-08-17"), y = 60, 
        label = "International\nReading\nAsociation", color="#F8766D")+
    ggplot2::annotate("text", x = as.Date("2006-01-17"), y = 38, 
        label = "Literacy\nResearch\nAssociation", color="#00BFC4") +
    theme_bw() +
    stat_smooth()

plot of chunk downward_trend

I wonder what might be causing the downward trend? Also, I notice the trend is growing apart for the two associations, with the International Reading Association being effected less. Can this downward trend be reversed?

Associated Terms

Lastly, I want to look at some term uses across time and see if they correspond with what I know to be historical events around literacy in education.

out %>%
    trend2long() %>%
    filter(term %in% names(out)[1:7]) %>%
    as.trend2long() %>%
    plot() + scale_colour_brewer(palette="Set1") +
    facet_wrap(~term, ncol=2) +
        guides(color=FALSE)

plot of chunk terms

This made me want to group the following 4 terms together as there’s near perfect overlap in the trends. I don’t have a plausible historical explanation for this. Hopefully, a more knowledgeable other can fill in the blanks.

out %>%
    trend2long() %>%
    filter(term %in% names(out)[c(1, 3, 5, 7)]) %>%
    as.trend2long() %>%
    plot()

plot of chunk overlap

I explored the three remaining terms in the graph below. As expected, ‘common core’ and ‘lexile’ (scores associated with quantitative measures of text complexity) are on an upward trend. Phonics on the other hand is on a downward trend.

out %>%
    trend2long() %>%
    filter(term %in% names(out)[c(2, 4, 6)]) %>%
    as.trend2long() %>%
    plot()

plot of chunk overlap2

This was an fun exploratory use of the GTrends package. Thanks to Steve Simpson for the introduction to GTrends and Philippe Massicotte and Dirk Eddelbuettel for sharing their work.

*Created using the reports package

About tylerrinker

Data Scientist, open-source developer , #rstats enthusiast, #dataviz geek, and #nlp buff

View all posts by tylerrinker →

This entry was posted in r and tagged ggplot2, Google Trend, gtrend, GTrendsR, R, Trends, trinker, tyler rinker. Bookmark the permalink.

23 Responses to GTrendsR package to Explore Google trending for Field Dependent Terms

Matthew says:

November 24, 2014 at 3:54 pm

Is it extremely easy to get an account rate-limited doing this?
I tried a search for three terms and it worked like a charm, tried another one for six terms and only got ‘[1] “No Trends data for data+mining – substituting NA series…”‘ returns from there on.

Reply
- tylerrinker says:
  
  November 24, 2014 at 4:47 pm
  
  @Matthew, there definitely is a daily limit but it’s a bit larger than what you describe. As far as I have experienced there was not an account rate-limit imposed for this activity. Is it possible the terms you searched for were not able to reach the threshold for views in order for the API to return results? You’d check this by rerunning the code I included that we know works.
  
  Reply
  - Matthew says:
    
    November 25, 2014 at 9:36 am
    
    I think they actually banned my IP for “suspicious activity” because I used a new google account to test it, I got a warning to change my passwords later on…. The ban does not seem to be lifted yet after ~16 hours, so beware.
mattsigal says:

November 24, 2014 at 9:05 pm

Have you read about any methods for comparing these trend lines, other than visually? I would be very interested in hearing!

Reply
- tylerrinker says:
  
  November 24, 2014 at 9:29 pm
  
  No but this is far outside of my realm of expertise. I believe the field of economics has ways to compare trends. But I too would be interested in knowing more about this. If other folks know of anything please share.
  
  Reply
  - Odanga Madung (@Odiemon) says:
    
    November 30, 2014 at 10:17 am
    
    Great article! I keep getting an error saying that %>% is not a valid function is there any remedy around this?
  - tylerrinker says:
    
    November 30, 2014 at 12:58 pm
    
    If you load the dplyr package this is available. It’s a chain pipe function imported from the magrittr package.
Ernesto Calvo says:

November 25, 2014 at 1:50 am

Excellent! Thank you!

Reply
Pingback: BibSonomy :: url :: GTrendsR package to Explore Google trending for Field Dependent Terms | TRinker's R Blog
Arie says:

February 9, 2015 at 7:45 pm

Very nice. Is it possible to change the geography here. For example, focus only on the US.

Reply
- Arie says:
  
  February 9, 2015 at 7:55 pm
  
  Actually let me rephrase that question. Is it possible to use geography other than country codes? Thanks!
  
  Reply
james says:

April 2, 2015 at 5:06 pm

Great article! I keep getting this error though when using gtrend_scraper, any ideas?

Error in as.character(x) :
cannot coerce type ‘closure’ to vector of type ‘character’

Reply
- tylerrinker says:
  
  April 2, 2015 at 8:34 pm
  
  See: https://github.com/trinker/gtrend/issues/3
  
  Reply
Asper says:

April 7, 2015 at 11:27 am

Very nice article. I have a little different task to do: comapare trends for different terms, like here http://www.google.com/trends/explore#q=roma%2C%20milano&cmpt=q&tz=
Do you know how can i do that?

Reply
Joe says:

July 9, 2015 at 6:02 pm

What if your gmail account requires a verification code in addition to a password. How would you enter that? Or should I instead create a new gmail account that doesn’t require a verification code?

Reply
Robert says:

October 23, 2015 at 11:37 pm

Hi Tyler,

In this post when you download all of the files, you pulled each one of these terms separately from google trends. This is different than doing them all together where you’re able to see the relative popularity of these terms to each other – which is what I think you wanted to do. Do you know if you’ve been able to find a package that does this? In other words, I type in the same keywords you did and get back one csv file with the relative popularity of each term?

Here is an example of what graph I’d like to see when running multiple queries: https://www.google.com/trends/explore#q=Windows%20XP%2C%20Windows%20Vista%2C%20Windows%207%2C%20Windows%208%2C%20Windows%2010&cmpt=q&tz=Etc%2FGMT%2B7

Also, if you haven’t found a reason why there is a negative trend in the “IRA” and “LRA” searches, let me offer a suggestion. In 2004, there was a much more academic bent to computer users (due to access of computers in universities, socioeconomic status, “techy-er” users, etc.) so with more of the populace getting access to the internet and computers, people in 2010 were searching for more (and different) things than in 2004.

Reply
Pingback: Using R to query Google Trends and Ngrams | Eryk Walczak
Farid says:

November 28, 2015 at 7:00 pm

Thanks for the post. Seems really helpful. One aspect I wasn’t able to find is how to get a 3m daily data or data from different time slots.

Reply
Pingback: Import von Google-Trends-Zeitreihen nach R | Scripts & Statistics
Mike G says:

December 19, 2015 at 7:40 pm

Does anyone know how to have the gtrends query be specific for the last 7 days, which will give you results by day?

Reply
- tylerrinker says:
  
  December 21, 2015 at 1:00 am
  
  Seems like a stackoverflow.com question. Try asking there.
  
  Reply
Satoshi Watanabe says:

January 7, 2016 at 3:19 am

Hi,
I executed the command “out<- gtrend_scraper("myaddress@gmail.com", "mypassword", "winter").
Of course I got the trend data.
But where is the regions data?
I could not get them by "out$regions".
Does anyone teach me how to do?

Thank you.

Reply
- tylerrinker says:
  
  January 7, 2016 at 4:35 am
  
  There have been major developments in the gtrendsR package that really make my own wrapper obsolete: http://dirk.eddelbuettel.com/blog/2015/11/29/ I’d suggest you invest time with gtrendsR https://github.com/PMassicotte/gtrendsR
  
  Reply