qdap 0.2.2 released

I’m very pleased to announce the release of qdap 0.2.2

logo

This is the third installment of the qdap package available at CRAN. The qdap package automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse, including frequency counts of sentence types, words, sentence, turns of talk, syllable counts and other assorted analysis tasks. The package provides parsing tools for preparing transcript data but may be useful for many other language processing tasks. Many functions enable the user to aggregate data by any number of grouping variables providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text.

 The biggest change is that qdap now is compiled for Mac users!!!  No need to download from source.  Just use:

install.packages(“qdap”)

Some of the changes in version 0.2.2 include:


NEW FEATURES

  • tot_plot- a visualizing function that uses a bar graph to visualize patterns in sentence length and grouping variables by turn of talk.
  • beg2char and char2end- functions to grab text from beginning of string to a character or from a character to the end of a string.
  • ngrams- function to calculate ngrams by grouping variable.

BUG FIXES

  • genXtract labels returned the word “right” rather than the right edge string. See here for an example of the old behavior. This behavior has been fixed.
  • gradient_cloud‘s min.freq locked at 1. This has been fixed. (Manuel Fdez-Moya)
  • termco would produce an error if single length named vectors were passed to match.list and no multilength vectors were supplied. Also an error was thrown if an unnamed multilength vector was passed to match.list. This behavior has been fixed.

For a complete list of changes see qdap’s NEWS

Development Version
github

Posted in discourse analysis, qdap, text | Tagged , , , , , , , , , , , , , | Leave a comment

Sharing my R notes

I started working with R 2 1/2 years ago. I remember opening R closing it and thinking it was the dumbest thing ever (command line to a non programmer is not inviting). Now it’s my constant friend. From the beginning I took notes to remind myself all of the things I learned and relearned. They’ve been invaluable to me in learning. They are not particularly well arranged nor do they credit sources properly. There are likely bad or outdated practices in there but I figured they may be helpful to others learning the language and so I’m sharing.

Note that :

1) they are poorly arranged
2) they may have mistakes
3) they don’t credit others work properly or at all

They were for me but now I think maybe others will find them useful so here they are:

*Note that the file is larger ~7000KB and 274 pages worth.

Posted in Uncategorized | Tagged , , , , , , , , | 19 Comments

Animations Understood

When I first saw a graphic made from Yihui’s animation package (Xie, 2013) I was amazed at the magic and thought “I could never do that”.

Passage of time…

One night I found myself bored and as usual avoiding work. I decided to try learning how to make an animation and an epiphany hit me in the head.


Figure 1: Animated gif of guy getting hit in the head.



Basically I realized the animation package works just like those flipbooks you made as a kid. You know the ones teachers would yell at you for and would lament the waste of tablet paper. Go on and enjoy this flipbook and allow yourself to be taken back to 3rd grade. Ahh.

Video: A clever use of a tablet.




Now where were we. Ah yes, animation, an electronic flipbook for dorky people. Basically here’s how it works:

  1. Create a scene (static components in a plot)
  2. Create element(s) that will change
  3. Wrap it all into a function
  4. Use the animation package to output an MP4 video, HTML file or an animated GIF.

Let’s do this…

Set It Up

First source this circle drawing function I stole from John Fox and load the animation package.

 source("http://dl.dropboxusercontent.com/u/61803503/wordpress/circle_fun.txt")
library(animation)

Create that function to draw a scene

 FUN <- function(y = 0.8) {
    opar <- par()$mar
    on.exit(par(mar = opar))
    par(mar = rep(0, 4))
    plot.new()
    circle(0.5, 0.6, 1, "cm", , 4)
    segments(0.5, 0.2, 0.5, 0.54, lwd = 4)
    segments(0.4, 0, 0.5, 0.2, lwd = 4)
    segments(0.6, 0, 0.5, 0.2, lwd = 4)
    segments(0.5, 0.4, 0.3, 0.5, lwd = 4)
    segments(0.5, 0.4, 0.7, 0.5, lwd = 4)
    points(0.5, y, pch = -9742L, cex = 4, col = "firebrick3")
}

FUN()
plot of chunk unnamed-chunk-2
Figure 2: Static plot of guy with a static phone.



This part runs FUN and allows the phone to “drop”.

This is where you supply multiple values to the portions of the graphic that will change. Build a function that runs recursively, outputting multiple graphics.

 oopt <- animation::ani.options(interval = 0.1)

FUN2 <- function() {
    lapply(seq(1.01, 0.69, by = -0.02), function(i) {
        FUN(i)
        animation::ani.pause()
    })
}

## FUN2()

Now save it any of the following formats

saveGIF(FUN2(), interval = 0.1, outdir = "images/animate")

saveVideo(FUN2(), interval = 0.1, outdir = "images/animate", 
    ffmpeg = "C:/Program Files (x86)/ffmpeg-latest-win32-static/ffmpeg-20130306-git-28adecf-win32-static/bin/ffmpeg.exe")

saveLatex(FUN2(), autoplay = TRUE, loop = FALSE, latex.filename = "tester.tex",
    caption = "animated dialogue", outdir = "images/animate", ani.type = "pdf",
    ani.dev = "pdf", ani.width = 5, ani.height = 5.5, interval = 0.1)

saveHTML(FUN2(), autoplay = FALSE, loop = FALSE, verbose = FALSE, outdir = "images/animate/new",
    single.opts = "'controls': ['first', 'previous', 'play', 'next', 'last', 'loop', 'speed'], 'delayMin': 0")

Oh yeah here’s the HTML version


Created using the reports (Rinker, 2013) package
Get the .Rmd file here
Just the R code


References

Rinker TW (2013). reports: Package to asssist in report writing. University at Buffalo/SUNY, Buffalo, New York. version 0.1.3, http://github.com/trinker/reports.

Xie Y (2013). animation: A gallery of animations in statistics and utilities to create animations. R package version 2.2, http://CRAN.R-project.org/package=animation.

Posted in animation, reports, Uncategorized, visualization | Tagged , , , , , , | 2 Comments

knitr2wordpress and gradient_cloud Revisited

This post serves three function:

  1. It allows me to revisit an old blogpost
  2. It let's me test out the new-ish knitr function knti2wp and RWordPress
  3. It enables me to avoid the massive ammount of reading I need to do and still feel like I'm doing “work”

The follwoing packages are needed to run the code:

install.packages(c("knitr", "qdap"))
install.packages("RWordPress", repos = "http://www.omegahat.org/R", type = "source")
library(qdap)
library(knitr)
library(RWordPress)

*Mac users see this link and this link

In this blogpost I explored the use of gradient word clouds. It took 31 lines of code to plot the figure. I'm lazy (though I tell other's efficient) and 31 lines is enough to keep me from exploring with the gradient word cloud. In a recent update to qdap I included a function to greatly reduce the lines of code in that post to 6, making gadient clouds more accessible.

Grab the Presidential Debate Transcript

# download transcript of the debate to working directory
url_dl(pres.deb1.docx)

Read in the Data

# load multiple files with read transcript and assign to working directory
dat1 <- read.transcript("pres.deb1.docx", c("person", "dialogue"))

# qprep for quick cleaning
dat1$dialogue <- qprep(dat1$dialogue)

# view a truncated version of the data (see also htruncdf)
left.just(htruncdf(dat1, 10, 45))
##    person dialogue                                     
## 1  LEHRER We'll talk about specifically about health ca
## 2  ROMNEY What I support is no change for current retir
## 3  LEHRER And what about the vouchers?                 
## 4  ROMNEY So that's that's number one. Number two is fo
## 5  OBAMA  Jim, if I if I can just respond very quickly,
## 6  LEHRER Talk about that in a minute.                 
## 7  OBAMA  but but but overall.                         
## 8  LEHRER OK.                                          
## 9  OBAMA  And so...                                    
## 10 ROMNEY That's that's a big topic. Can we can we stay

Remove Lehrer (need bivariate variable) and plot

dat2 <- rm_row(dat1, 1, "LEHRER")  #make a bivariate column (remove LEHRER)

gradient_cloud(dat2$dialogue, dat2$person, title = "Debate", X = "blue", Y = "red", 
    stopwords = BuckleySaltonSWL, max.word.size = 2.2, min.word.size = 0.55)

plot of chunk grad_cloud

Notice we have control over min/max word size, the two colors and stopwords? Easy huh?

Try a Few more with Different Parameters

gradient_cloud(dat2$dialogue, dat2$person, title = "fun", X = "green", Y = "orange")

gradient_cloud(dat2$dialogue, dat2$person, title = "fun", rev.binary = TRUE)

gradient_cloud(dat2$dialogue, dat2$person, title = "fun", max.word.size = 5, 
    min.word.size = 0.025)

Now Discussion on knitr to WordPress

Here is the Rmd (text) of the file used to make this post.

Here's the format I used to send the file to WordPress.com

options(WordPressLogin = c(USERNAME = "PASSWORD"), WordPressURL = "http://trinkerrstuff.wordpress.com/xmlrpc.php")
library(knitr)

knit2wp(file.path("C:/Users/trinker/Desktop/gradient_clouds_revisited/PRESENTATION", 
    "gradient_clouds_revisited.Rmd"), title = "knitr2wordpress and gradient_cloud Revisited", 
    shortcode = TRUE)

knit2wp("yourfile.Rmd", title = "knitr2wordpress and gradient_cloud Revisited")

Where USERNAME and PASSWORD are your WordPress username and password.


Please note that there was some confusion I had about where the base.url and
base.dir went. For more on this problem see this thread.

Posted in knitr, qdap, text, Uncategorized, visualization, word cloud | Tagged , , , , , , , , , , , , , , , , | 4 Comments

qdap 0.2.1 Released

I’m very pleased to announce the release of qdap 0.2.1

This is the second installment of the qdap package available at CRAN. The qdap package automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse, including frequency counts of sentence types, words, sentence, turns of talk, syllable counts and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text.

logo

Note: qdap is not compiled for Mac users. Installation instructions for Mac user or other OS users having difficulty installing qdap please click here.


Some of the changes in version 0.2.1 include:

NEW FEATURES

* `gradient_cloud`: Binary gradient Word Cloud – A new plotting function
that plots and colors words for a binary variable based on which group of
the binary variable uses the term more frequently.

* `new_project`: A project template generating function designed to increase
efficiency and standardize work flow. The project comes with a .Rproj file
for easy use with RStudio as well as a .Rprofile that makes loading and sourcing
of packages, data and project functions. This function uses the reports package
to generate an extensive reports folder.

BUG FIXES

* `word_associate` colors the word cloud appropriately and deals with the error
caused by a grouping variable not containing any words from 1 or more of the
vectors of a list supplied to match string

* `trans.cloud` produced an error when expand.target was TRUE. This error has
been eliminated.

* `termco` would eliminate > 1 columns matching an identical search.term found
in a second vector of match.list. termco now counts repeated terms multiple
times.

* `cm_df.transcript` did not give the correct speaker labels (fixed).


For a complete list of changes see qdap’s NEWS

Development Version
github

Posted in qdap, text, Uncategorized, visualization, work flow | Tagged , , , , , , , , , , , , , | 2 Comments

reports 0.1.2 Released

I’m very pleased to announce the release of reports : An R package to assist in the workflow of writing academic articles and other reports.

This is the first CRAN release of reports: http://cran.r-project.org/web/packages/reports/index.html

The reports package assists in writing reports and presentations by providing a frame work that brings together existing R, LaTeX/.docx and Pandoc tools. The package is designed to be used with RStudio, MiKTex/Tex Live/LibreOffice, knitr, knitcitations, Pandoc and pander (and installr for Windows users). The user will want to download these free programs/packages to maximize the effectiveness of the reports package. Functions with two letter names are general text formatting functions for copying text from articles for inclusion as a citation.

reports

Github development version: https://github.com/trinker/reports

As reports is further developed the following are planned: (a) a help video section and (b) a vignette detailing workflow and use of reports.

Check out this introductory video:

Quick start slides:

HTML5 Slides
HTML5 Slides

For more on the potential use of reports see this blog post.

Posted in work flow, reports | Tagged , , , , , , , , , , , , , , , , , | 7 Comments

Workflow w/ reports package

NOTE: THIS IS NOW A PACKAGE SEE THIS LINK FOR DETAILS

Let me start with a video for people who just want to see what I’m demo-ing first:

I’ve been interested in speeding up workflow lately and spending a lot of time doing so. I’ve seen people already try to tackle this in R in the past.  This blog post covers many aspects of workflow and increasing productivity.  John Myles White has tackled this problem and created the ProjectTempalte package.  The idea is terrific but the problem is that the R user is so varied in their work flows that it’s difficult to make one workflow template for everyone.  I’ve given up on that.  Instead I propose:

1. The R community modularize workflow into field dependent pieces.

For instance in qdap, an R package for quantitative discourse analysis, I’ve added a work flow template that people in my field would find suiting.  However, the report writing part I intentionally left underdeveloped because I plan to add the reports package as a piece of the workflow.  While my entire work flow is likely only useful for discourse analysis people, the reports section is much more generalizable.  In this way we build work flow from modular pieces.

2. Make the pieces flexible (within reason).

For example in the beta version of reports I have added the ability for users to submit templates via doc_temp (not sure how well this will work) which provides a template that alters the documents that the new_report template will generate. The doc_temp function is similar to package.skeleton.  The functionality will be similar to the way CRAN or CTAN house packages with the templates library housed within the package, provided it doesn’t get to large. The submissions still need to conform to a standard (the within reason part) though the user may choose to keep their template local.

3. Use existing tools (powerful, flexible and efficient).

R has had some great developments in tools, combined with latex, we can really speed up workflow; RStudio, knitr, MikTex/Tex Live, bibtexknitcitations and of course R to name a few.  By utilizing all these tools we really maximize productivity in that we’re not going to multiple places and reloading libraries and user defined functions.  As an example, recently, R bloggers Daniel Liidecke and Andrew Landgraf discussed custom functions that they use frequently .  By placing these in the extra_functions.R script and then opening with RStudio, the project’s .Rprofile will source these functions automatically and load them as well just by opening the project. Better still if these are constantly used functions that don’t yet have a package home the user can supply the path(s) to new_report and the code will be added automatically to the report project’s .Rprofile for sourcing.

The idea is to generate a template that is fast and flexible which keeps everything for a report housed in one place.  In this way the report framework of the reports package can be added as a piece to the rest of your workflow.

Trying the reports package

 #INSTALLING
library(devtools)
install_github("reports", "trinker")

#GETTING STARTED
library(reports)
# setwd("~/your/favorite/directory/here")
new_report("New")

#PLAY AROUND A BIT
templates()   #current internally housed templates

new_report("new proj2", templates(FALSE)[2]) #quantitative Rnw
new_report("new proj3", templates(FALSE)[3]) #qualitative docx

I encourage you to view the intro video, look at the help manual, check out the html5 introductory slides and just play with the reports package a bit.  I want your feedback to make a tool others can use to help them in their work flow. If your comments are more substantial please use the Issue Tracking of GitHub.

Posted in qdap, work flow | Tagged , , , , , , , , , , | 13 Comments

qdap 0.2.0 released

This is the first CRAN release of qdap (qdap 0.2.0) found here.  qdap (Quantitative Discourse Analysis Package) is an R package designed to assist in quantitative discourse analysis. The package stands as a bridge between qualitative transcripts of dialogue and statistical analysis and visualization.

The qdap package automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse, including frequency counts of sentence types, words, sentence, turns of talk, syllable counts and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This provides the user with a more efficient and targeted analysis.

qdap’s development version can be found here.

As qdap is further developed the following tasks are planned: (a) a github hosted website via staticdocs (b) a help video section and (c) a vignette detailing workflow and use of qdap.

If you spot bugs or would like to request features please use qdap’s github site.


Special thanks to Dason of talkstats.com for his patience in teaching and mentoring me through the package creation process.

Also thank you to Hadley Wickham for his great package development tools and documentation of the process.

Posted in discourse analysis, package creation, qdap | Tagged , , , , , , | Leave a comment

Tips for R Package Creation

I’m being tortured by the mistakes of my past self. I think I’ve made most every mistake possible in creating a package and I want to go back in time and tell year ago me all I know now. But it seems require(timetravel) isn’t working on my machine. So instead I’ll share with other new package creators what I’ve learned along the way in a sort of tips list (Letterman style). To give context, I am working on documenting a package (qdap), after it’s functions are finished (bad idea) and am lamenting all the mistakes as this was my first package attempt and its a major under taking.

Here are the things (riddled with helpful links) I wish I had known then that I know now:

  1. Start Small – It’s easier to learn to drive in a car than a dump truck.  I suggest making a small package even if it’s for fun to learn the process first (a game, music player  or fun visualization may be perfect for this).  This way you can refer back to this package often for “How did I do that?”   
  2. Use gitGitHub, bitbucket or some other git interface works awesome to upload a repository to a cloud (dropbox style interface) that you can back up your repo as well as share and collaborate with others.  (here’s a clip about github that’s slightly out of date but still good: LINK)  The issues tab is awesome for documenting bugs and requests.
  3. Use Rstudio – When I first started on qdap, as a windows user, the package creation process was painful.  Rstudio makes your life so much better.  Here’s a video example of how quick it is to create a package with Rstudio LINK 1 and a slighly out of date video of the interface between git and Rstudio LINK 2.
  4. Become familiar with “Writing R Extensions” manual - This is the rule book.  It’s like a club, if you don’t have the right look you aren’t getting in. Nuff said.
  5. Steal – github was designed to collaborate (aka stealing).  Find a trusted package developer and steal their format and design.  I personally steal from two places: Hadley Wickham’s github and Dason Kurkiewicz’s github.  All their files are there for easy sourcing.
  6. Document as you go – Trust me documenting over time is easier than documenting at the end.
  7. Document with roxygen2roxygen2 is a less painful way to write documentation (I recommend actually doing an .Rd file, aka a documentation file, by hand to feel the pain and appreciate roxygen2).  Here’s where stealing other people’s format is extremely useful; look at this Hadely .R file.  It’s nice when you’ve used roxygen2 to click roxygenize(path/to/repo) and the documentation is created.
  8. Use devtools – There are some great developmental tools in devtools (though many if not most/all are incorporated into Rstudio).
  9. Use testthat – I didn’t get why this was useful until I started trying to make changes to my package at the end.  Ever pull a thread on a sweater and it makes a big hole, that’s what a change in a package can do and testthat can help to make sure the changes don’t make a big hole.
  10. Learn to debug – I had no clue how cool  browser() was or how to use it when I started.  Here’s a nice video on R’s debugging tools: LINK.  Debugging stinks, debugging without tools really stinks.
  11. Reduce, Recycle, Reuse – Try to think “will I use this code chunk later?”  If the answer is yes break it off as function of its own and throw it in the package as an internal “helper” function.  This saves time and makes the code more readable.  Also try to make the code compact but as fast as possible.  benchmarking and Rcpp can make the code faster.
  12. Make friends/learning community – The folks at talkstats.com and stackoverflow.com have been a tremendous help in asking about the process and getting feedback.  I wouldn’t know about most of the above things if it were not for these two learning places.

Special thanks to Dason of talkstats.com for his patience in teaching and mentoring me through the package creation process.

Posted in package creation, Uncategorized | Tagged , , , , | 3 Comments

Gradient Word Clouds

I like word clouds because they are visually appealing and provide a ton of information in a small space. Ever since I saw Drew Conway’s post (LINK) I have been looking for ways to improve word clouds. One of the nice feature’s of Drew’s post was that he colored the words according to the gradient. Unfortunately, Drew’s cloud lacks some of the aesthetic wow factor that Ian Fellow’s wordcloud package is known for.

This post is going to show you how to color words with a gradient based on degree of usage between two individuals. For me it’s going to help me learn the following things:

  1. How to use knitr + markdown to make a blog post (I’ve been using knitr for reproducible latex/beamer reports).
  2. How to use gradients in base (i.e. outside of ggplot2 that I’ve come to depend on).
  3. How to make a gradient color bar in base.

Installing and Loading qdap and wordcloud

First you’ll need some packages to get started. I’m using my own beta package qdap plus Fellow’s wordcloud packages. If you download qdap wordcloud is part of the install. For the legend we’ll be using the plotrix package.

 library(qdap)
library(wordcloud)
library(plotrix)

Reading in data

Now we’ll need some data. I happen to have presidential debate data (debate # 1) left over that we can still mine.

# download transcript of the debate to working directory
url_dl(pres.deb1.docx)

# load multiple files with read transcript and assign to working directory
dat1 <- read.transcript("pres.deb1.docx", c("person", "dialogue"))

# qprep for quick cleaning
dat1$dialogue <- qprep(dat1$dialogue)

#view a truncated version of the data (see also htruncdf)
left.just(htruncdf(dat1, 10, 45))
person dialogue
1 LEHRER We'll talk about specifically about health ca
2 ROMNEY What I support is no change for current retir
3 LEHRER And what about the vouchers?
4 ROMNEY So that's that's number one. Number two is fo
5 OBAMA Jim, if I if I can just respond very quickly,
6 LEHRER Talk about that in a minute.
7 OBAMA but but but overall.
8 LEHRER OK.
9 OBAMA And so...
10 ROMNEY That's that's a big topic. Can we can we stay

Setting Up the Data

  1. Make a word frequency matrix
  2. Remove Lehrer’s words
  3. Scale the word usage
  4. Create a binned fill variable
word.freq <- with(dat1, wfdf(dialogue, person))[, -2]
csums <- colSums(word.freq[, -1])
conv.fact <- csums[2]/csums[1]
word.freq$ROMNEY2 <- word.freq[, "ROMNEY"] * conv.fact
#colSums(word.freq[, -1])
word.freq[, "total"] <- rowSums(word.freq[, -1])
word.freq$continum <- with(word.freq, ROMNEY2-OBAMA)
word.freq <- word.freq[word.freq$total != 0,] #remove Leher only words
MAX <- max(word.freq$continum[!is.infinite(word.freq$continum)])
word.freq$continum <- ifelse(is.infinite(word.freq$continum), MAX, word.freq$continum)
conv.fact2 <- abs(range(word.freq$continum ))
conv.fact2 <- max(conv.fact2)/min(conv.fact2)
word.freq$continum <- ifelse(word.freq$continum > 0, word.freq$continum * conv.fact2, word.freq$continum)
cuts <- c(-250, -25, -15, -10, -5, -2.5, -1.5, -1, -.5, -.25)
cuts <- sort(c(cuts, 0, abs(cuts)))
word.freq$fill.var <- cut(word.freq$continum, breaks=cuts )
head(word.freq, 10)
Words ROMNEY OBAMA ROMNEY2 total continum fill.var
1 a 83 72 73.125 228.125 1.5470 (1.5,2.5]
2 aarp 0 1 0.000 1.000 -1.0000 (-1.5,-1]
3 able 6 7 5.286 18.286 -1.7138 (-2.5,-1.5]
4 about 11 11 9.691 31.691 -1.3087 (-1.5,-1]
5 above 1 0 0.881 1.881 1.2111 (1,1.5]
6 abraham 0 2 0.000 2.000 -2.0000 (-2.5,-1.5]
7 absolutely 2 2 1.762 5.762 -0.2379 (-0.25,0]
8 academy 0 1 0.000 1.000 -1.0000 (-1.5,-1]
9 accept 1 0 0.881 1.881 1.2111 (1,1.5]
10 accomplish 1 0 0.881 1.881 1.2111 (1,1.5]

Convert the Binned Variable to Colors

I was not sure how to produce gradients outside of ggplot2 and so I asked on stackoverflow.com and received a terrific and simple answer from thelatemail (LINK). Now we’ll create a color column based on the fill.var using qdap‘s lookup that uses an environment to recode.

colfunc <- colorRampPalette(c("red", "blue"))
word.freq$colors <- lookup(word.freq$fill.var, levels(word.freq$fill.var),
    rev(colfunc(length(levels(word.freq$fill.var)))))
head(word.freq, 10)
Words ROMNEY OBAMA ROMNEY2 total continum fill.var colors
1 a 83 72 73.125 228.125 1.5470 (1.5,2.5] #BB0043
2 aarp 0 1 0.000 1.000 -1.0000 (-1.5,-1] #5000AE
3 able 6 7 5.286 18.286 -1.7138 (-2.5,-1.5] #4300BB
4 about 11 11 9.691 31.691 -1.3087 (-1.5,-1] #5000AE
5 above 1 0 0.881 1.881 1.2111 (1,1.5] #AE0050
6 abraham 0 2 0.000 2.000 -2.0000 (-2.5,-1.5] #4300BB
7 absolutely 2 2 1.762 5.762 -0.2379 (-0.25,0] #780086
8 academy 0 1 0.000 1.000 -1.0000 (-1.5,-1] #5000AE
9 accept 1 0 0.881 1.881 1.2111 (1,1.5] #AE0050
10 accomplish 1 0 0.881 1.881 1.2111 (1,1.5] #AE0050

Plot the Word Cloud and Gradient Legend

Now that we have color gradients let’s use wordcloud to plot and plotrix‘s color.legend to make a legend. I didn’t know how to create the gradient legend either and asked again on stackoverflow where I received an answer from Dason and mnel (LINK). Both great answers but I went with Dason’s.

par(mar=c(7,1,1,1))
wordcloud(word.freq$Words, word.freq$total, colors = word.freq$colors,
    min.freq = 1, ordered.colors = TRUE, random.order = FALSE, rot.per=0,
    scale = c(5, .7))
# Add legend
COLS <- colfunc(length(levels(word.freq$fill.var)))
color.legend(.025, .025, .25, .04, qcv(Romney,Obama), COLS)

gradient word cloud

Note: If you plot to the console graphics device you can’t get a large enough size to plot all the words comfortably. I achieved the above results plotting externally to png @ 1000 x 1000 (w x h)

Concluding Thoughts

Alright, this is my first knitr generated blog post. Very easy. I regret not having tried it earlier :(

I accomplished my goal of making a gradient word cloud and a gradient legend. The actual word cloud really isn’t that informative because there’re too many words and too little variation in word choice/colors. In some situations this approach may be useful but in this one I don’t like it. Secondly, I used the blue to red theme because it plays to the political parties but in this visualization better contrasting colors would be more appropriate. Overall I don’t feel I was successful in presenting information better than Drew Conway’s post.

What the Reader Can Take Away from the Post

  1. Using wordcloud’s user defined color feature
  2. Using qdap’s lookup to recode
  3. Creating gradients in base (easy)
  4. Creating the accompanying gradient legend

If the reader has improvements in scaling, visualizing parameters ect. please share these and other comments below.

For a .txt version of this script -click here-

Addendum:
To make a knitr output upload to wordpress.com I found help from
http://www.carlboettiger.info

Posted in discourse analysis, text, visualization, word cloud | Tagged , , , , , | 5 Comments