microbenchmarking with R

I love to benchmark.  Maybe I’m a bit weird but I love to bench  everything in R.  Recently I’ve had people raise accuracy challenges to the typical system.time and rbenchmark package approaches to benchmarking.  I saw Hadley Wickham promoting the package microbenchmarking and decided to give it a whirl.  This approach claims to improve accuracy and adjusts to your OS.  A nice box plot or a ggplot of the functions output can also aid in understanding and comparing functions.  Here’s a demo test:

library(microbenchmark); library(plyr) 
op <- microbenchmark(
    PLYR=ddply(mtcars, .(cyl, gear), summarise, 
        output = mean(hp)),
    AGGR=aggregate(hp ~ cyl + gear, mtcars, mean),
    TAPPLY = tapply(mtcars$hp, interaction(mtcars$cyl, 
        mtcars$gear), mean),
times=1000L)

print(op) #standard data frame of the output
boxplot(op) #boxplot of output
library(ggplot2) #nice log plot of the output
qplot(y=time, data=op, colour=expr) + scale_y_log10()

The output to the console window using print(op) yields like this:

Unit: milliseconds
    expr      min       lq   median       uq       max
1   AGGR 2.856758 2.972932 3.121999  3.48615 121.49828
2   PLYR 7.880229 8.497956 8.983880 10.71436 139.04940
3 TAPPLY 1.108085 1.159873 1.196731  1.30824  67.33326

The ggplot log plot from the output:

ggplot2 Plot of the Output

The boxplot from output:

Box Plot of the Output

Advertisement

About tylerrinker

Data Scientist, open-source developer , #rstats enthusiast, #dataviz geek, and #nlp buff
This entry was posted in benchmark and tagged , , , . Bookmark the permalink.

7 Responses to microbenchmarking with R

  1. ledzep says:

    The plots, in addition to the benchmark comparision are really cool. Plyr seems to be lagging behind in the speed. I really really love plyr. Wish it were more faster than it is now.

    • tylerrinker says:

      Yeah plyr was meant for ease. It certainly is easy to use and Hadley continues to boost the speed but it definitely isn’t the fastest. If speed isn’t a concern (i.e. your data set is small) go for plyr. I really liked the ggplot output for visualization. It gives a really good idea of what’s really happening. I think microbenchmark is my new benchmarking platform.

  2. Pingback: Tips for R Package Creation | TRinker's R Blog

  3. Christin says:

    This design is spectacular! You obviously know how to

    keep a reader entertained. Between your wit and your videos, I
    was almost moved to start my own blog (well,

    almost…HaHa!) Great job. I really enjoyed what you had to say,
    and more than

    that, how you presented it. Too cool!

  4. very nice put up, i definitely love this web site,

    carry on it

  5. Pingback: useResearch – Usage Analytics for R Functions, Pt.1

  6. Pingback: useResearch – Usage Analytics for R Functions, Pt.1 – Mubashir Qasim

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s