I love to benchmark. Maybe I’m a bit weird but I love to bench everything in R. Recently I’ve had people raise accuracy challenges to the typical system.time and rbenchmark package approaches to benchmarking. I saw Hadley Wickham promoting the package microbenchmarking and decided to give it a whirl. This approach claims to improve accuracy and adjusts to your OS. A nice box plot or a ggplot of the functions output can also aid in understanding and comparing functions. Here’s a demo test:
library(microbenchmark); library(plyr) op <- microbenchmark( PLYR=ddply(mtcars, .(cyl, gear), summarise, output = mean(hp)), AGGR=aggregate(hp ~ cyl + gear, mtcars, mean), TAPPLY = tapply(mtcars$hp, interaction(mtcars$cyl, mtcars$gear), mean), times=1000L) print(op) #standard data frame of the output boxplot(op) #boxplot of output library(ggplot2) #nice log plot of the output qplot(y=time, data=op, colour=expr) + scale_y_log10()
The output to the console window using print(op) yields like this:
Unit: milliseconds expr min lq median uq max 1 AGGR 2.856758 2.972932 3.121999 3.48615 121.49828 2 PLYR 7.880229 8.497956 8.983880 10.71436 139.04940 3 TAPPLY 1.108085 1.159873 1.196731 1.30824 67.33326
The ggplot log plot from the output:
The boxplot from output:
The plots, in addition to the benchmark comparision are really cool. Plyr seems to be lagging behind in the speed. I really really love plyr. Wish it were more faster than it is now.
Yeah plyr was meant for ease. It certainly is easy to use and Hadley continues to boost the speed but it definitely isn’t the fastest. If speed isn’t a concern (i.e. your data set is small) go for plyr. I really liked the ggplot output for visualization. It gives a really good idea of what’s really happening. I think microbenchmark is my new benchmarking platform.
Pingback: Tips for R Package Creation | TRinker's R Blog
This design is spectacular! You obviously know how to
keep a reader entertained. Between your wit and your videos, I
was almost moved to start my own blog (well,
almost…HaHa!) Great job. I really enjoyed what you had to say,
and more than
that, how you presented it. Too cool!
very nice put up, i definitely love this web site,
carry on it
Pingback: useResearch – Usage Analytics for R Functions, Pt.1
Pingback: useResearch – Usage Analytics for R Functions, Pt.1 – Mubashir Qasim