My friend, Steve Simpson, introduced me to Philippe Massicotte and Dirk Eddelbuettel’s GTrendsR GitHub package this week. It’s a pretty nifty wrapper to the Google Trends API that enables one to search phrase trends over time. The trend indices that are given are explained in more detail here: https://support.google.com/trends/answer/4355164?hl=en
Ever have a toy you know is super cool but don’t know what to use it for yet? That’s GTrendsR for me. So I made up an activity to use it for, that’s related to my own interests (click HERE to download the just R code for this post). I decided to chose the first 10 phrases I could think of, related to my field, literacy. I then used GTrendsR to view how Google search trending has changed for these terms. Here are the 10 biased terms I choose:
- reading assessment
- common core
- reading standards
- whole language
- lexile score
- balanced approach
- literacy research association
- international reading association
The last term did not receive enough hits to trend which is telling since the field is talking about multimodality but search trends don’t seem to be affect to the point of registering with Google Trends.
The GTrendsR package provides great tools for grabbing the information from Google, however, for my own task I wanted simpler tools to grab certain chunks of information easily and format them in a tidy way. So I built a small wrapper package, mostly for myself, that will likely remain a GitHub only package: https://github.com/trinker/gtrend
You can install it for yourself (We’ll use it in this post), and load all necessary packages via:
devtools::install_github("dvanclev/GTrendsR") devtools::install_github("trinker/gtrend") library(gtrend); library(dplyr); library(ggplot2); library(scales)
The Initial Search
When you perform the search with
gtrend_scraper, you will need to enter your Google user name and password.
I did an initial search and plotted the trends for the 9 terms. It was a big, colorful, clustery mess.
terms <- c("reading assessment", "common core", "reading standards", "phonics", "whole language", "lexile score", "balanced approach", "literacy research association", "international reading association" ) out <- gtrend_scraper("firstname.lastname@example.org", "password", terms) out %>% trend2long() %>% plot()
So I faceted each of the terms out to look at the trends.
out %>% trend2long() %>% ggplot(aes(x=start, y=trend, color=term)) + geom_line() + facet_wrap(~term) + guides(color=FALSE)
Some interesting patterns began to emerge. I noticed a repeated pattern in almost all of the educational terms which I thought interesting. First we’ll explore that. The basic shape wasn’t yet discernible and so I took a small subset of one term,
reading+assessment, to explore the trend line by year:
##  "reading+assessment"
dat <- out[][["trend"]] colnames(dat) <- "trend" dat2 <- dat[dat[["start"]] > as.Date("2011-01-01"), ] rects <- dat2 %>% mutate(year=format(as.Date(start), "%y")) %>% group_by(year) %>% summarize(xstart = as.Date(min(start)), xend = as.Date(max(end))) ggplot() + geom_rect(data = rects, aes(xmin = xstart, xmax = xend, ymin = -Inf, ymax = Inf, fill = factor(year)), alpha = 0.4) + geom_line(data=dat2, aes(x=start, y=trend), size=.9) + scale_x_date(labels = date_format("%m/%y"), breaks = date_breaks("month"), expand = c(0,0), limits = c(as.Date("2011-01-02"), as.Date("2014-12-31"))) + theme(axis.text.x = element_text(angle = -45, hjust = 0))
What I noticed was that for each year there was a general double hump pattern that looked something like this:
This pattern holds consistent across educational terms. I added some context to a smaller subset to help with the narrative:
dat3 <- dat[dat[["start"]] > as.Date("2010-12-21") & dat[["start"]] < as.Date("2012-01-01"), ] ggplot() + geom_line(data=dat3, aes(x=start, y=trend), size=1.2) + scale_x_date(labels = date_format("%b %y"), breaks = date_breaks("month"), expand = c(0,0)) + theme(axis.text.x = element_text(angle = -45, hjust = 0)) + theme_bw() + theme(panel.grid.major.y=element_blank(), panel.grid.minor.y=element_blank()) + ggplot2::annotate("text", x = as.Date("2011-01-15"), y = 50, label = "Winter\nBreak Ends") + ggplot2::annotate("text", x = as.Date("2011-05-08"), y = 70, label = "Summer\nBreak\nAcademia") + ggplot2::annotate("text", x = as.Date("2011-06-15"), y = 76, label = "Summer\nBreak\nTeachers") + ggplot2::annotate("text", x = as.Date("2011-08-18"), y = 63, label = "Academia\nReturns") + ggplot2::annotate("text", x = as.Date("2011-08-17"), y = 78, label = "Teachers\nReturn")+ ggplot2::annotate("text", x = as.Date("2011-11-17"), y = 61, label = "Thanksgiving")
Of course this is all me trying to line up dates with educational search terms in a logical sense; a hypothesis rather than an firm conclusion. If this visual model is correct though, that these events impact Google searches around educational terms, and if a Google search is an indication of work to advance understanding of a concept, it’s clear that folks aren’t too interested in doing much advancing of educational knowledge at Thanksgiving and Christmas time. These are of course big assumptions. But if true, the implications extend further. Perhaps the most fertile time to engage educators, education students, and educational researchers is the first month after summer break.
I also noticed that the two major literacy organizations are in a negative downward trend.
out %>% trend2long() %>% filter(term %in% c("literacy+research+association", "international+reading+association")) %>% as.trend2long() %>% plot() + guides(color=FALSE) + ggplot2::annotate("text", x = as.Date("2011-08-17"), y = 60, label = "International\nReading\nAsociation", color="#F8766D")+ ggplot2::annotate("text", x = as.Date("2006-01-17"), y = 38, label = "Literacy\nResearch\nAssociation", color="#00BFC4") + theme_bw() + stat_smooth()
I wonder what might be causing the downward trend? Also, I notice the trend is growing apart for the two associations, with the International Reading Association being effected less. Can this downward trend be reversed?
Lastly, I want to look at some term uses across time and see if they correspond with what I know to be historical events around literacy in education.
out %>% trend2long() %>% filter(term %in% names(out)[1:7]) %>% as.trend2long() %>% plot() + scale_colour_brewer(palette="Set1") + facet_wrap(~term, ncol=2) + guides(color=FALSE)
This made me want to group the following 4 terms together as there’s near perfect overlap in the trends. I don’t have a plausible historical explanation for this. Hopefully, a more knowledgeable other can fill in the blanks.
out %>% trend2long() %>% filter(term %in% names(out)[c(1, 3, 5, 7)]) %>% as.trend2long() %>% plot()
I explored the three remaining terms in the graph below. As expected, “common core” and “lexile” (scores associated with quantitative measures of text complexity) are on an upward trend. Phonics on the other hand is on a downward trend.
out %>% trend2long() %>% filter(term %in% names(out)[c(2, 4, 6)]) %>% as.trend2long() %>% plot()
This was an fun exploratory use of the GTrends package. Thanks to Steve Simpson for the introduction to GTrends and Philippe Massicotte and Dirk Eddelbuettel for sharing their work.
*Created using the reports package