The Need for paste2 (part III)

Final installment:  Part III of a multi part blog on the paste2 function…

In my first post on the paste2 function I promised proof of a few practical uses.  In Part II of this series we looked at using paste2 as a convenient way of making a look up table to look up ability scores from the ltm package.  In this post we’ll be looking at two recent uses of paste as a convenient helper function.

Use 1:

Recently on stackoverflow.com  a poster asked about determining what values in a data set  (data set A) are not also found in data set B.  Not much info was given and debate ensued about mixed mode data sets, speed and safety etc.

Here’s a data set similar to the one posters proposed as the challenge (Give it a whack and see if you can solve it) :

A <- data.frame(x = 1:8, y = as.character(1:8))
B <- data.frame(x = c(4:1, 6), y = as.character(c(4:1, 6)))

If you got a way please share, I’d be interested. Now be warned paste2 isn’t the fastest but it’s the safest method that was proposed.

#############################################################
# FIRST LOAD THE paste2 FUNCTION AND CONSTRUCT THE DATA SET #
#############################################################
load(url("http://dl.dropbox.com/u/61803503/paste2demo2.RData"))

A <- data.frame(x = 1:8, y = as.character(1:8))
B <- data.frame(x = c(4:1, 6), y = as.character(c(4:1, 6)))

#####################################################
# LET'S LOOK FIRST AT SOME EXAMPLES THAT DON'T WORK #
#####################################################
##############################
# only works on numeric data #
##############################
A[! data.frame(t(A)) %in% data.frame(t(B)), ] 

##########################################################
# relies on rownames being the same which the may not be #
# as in this case : (                                    #
##########################################################
A[- as.integer(rownames(B)),]

################################################
# safe paste2 way (handles character/numeric & #
# no relieance on row names                    #
################################################
A[!paste2(A)%in%paste2(B), ]

Use 2:

Recently I’ve been working with error bars on count data and inquired on talkstats.com  about used the paste2 function as a helper function to accomplish the task of finding standard errors for count data.  Here’s the function with paste2 in it.  It’s probably not necessary but makes life a lot easier.

SEcount <- function(dat, confidence = .95, se.dig=3, intv.digs=2){
    if (is.list(dat)) dat <- ftable(cbind(dat))
    dat2 <- as.data.frame(dat)
    len <- ncol(dat2)
    if (any(names(dat2) %in% "group")) {
        f <- names(dat2) %in% "group"
        name <- paste0("orig.", names(dat2) [f])
        names(dat2)[f] <- name
    }
    #HERE'S THE paste2 FUNCTION USE
    dat2$group <- factor(paste2(dat2[, 1:(len-1)], sep=":"))
    dat2 <- dat2[order(dat2$group), c(len + 1, 1:len)]
    dat2 <- dat2[dat2$Freq!=0 ,]
    se <- summary(glm(Freq ~  group -1, data=dat2, 
       family=poisson))[["coefficients"]][, 2]
    est <- summary(glm(Freq ~  group -1, data=dat2, 
       family=poisson))[["coefficients"]][, 1]
    dat2$SE <- round(summary(glm(Freq ~  group -1, data=dat2, 
       family=poisson))[["coefficients"]][, 2], digits=se.dig)
    n.SE <- qnorm(1-(1-confidence)/2)
    dat2$minus_1_SE <- round(dat2$Freq - se, digits=intv.digs)
    dat2$plus_1_SE <- round(dat2$Freq + se, digits=intv.digs)
    if (!is.null(confidence)){
        dat2$lower <- round(exp(est - n.SE*se), digits=intv.digs)
        dat2$upper <- round(exp(est + n.SE*se), digits=intv.digs)
    }
    rownames(dat2) <- 1:nrow(dat2)
    return(dat2)
}
###############
# TEST IT OUT #
###############
SEcount(mtcars[, c("cyl", "gear", "carb")])
SEcount(mtcars[, 8:11])
x <- ftable(mtcars[, 8:10])
SEcount(x) #takes ftables too

I hope you learned something about paste2 and may add it to your .Rprofile.  It’s not always the fastest choice but can be very convenient and easy to use.

Click here for a .txt version of this script used here

Advertisements

About tylerrinker

I am Literacy PhD student with a bent for the quantitative and a passion for R.
This entry was posted in paste and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s