## The Need for paste2 (part III)

Final installment:  Part III of a multi part blog on the paste2 function…

In my first post on the `paste2` function I promised proof of a few practical uses.  In Part II of this series we looked at using paste2 as a convenient way of making a look up table to look up ability scores from the ltm package.  In this post we’ll be looking at two recent uses of paste as a convenient helper function.

Use 1:

Recently on stackoverflow.com  a poster asked about determining what values in a data set  (data set A) are not also found in data set B.  Not much info was given and debate ensued about mixed mode data sets, speed and safety etc.

Here’s a data set similar to the one posters proposed as the challenge (Give it a whack and see if you can solve it) :

```A <- data.frame(x = 1:8, y = as.character(1:8))
B <- data.frame(x = c(4:1, 6), y = as.character(c(4:1, 6)))```

If you got a way please share, I’d be interested. Now be warned paste2 isn’t the fastest but it’s the safest method that was proposed.

```#############################################################
# FIRST LOAD THE paste2 FUNCTION AND CONSTRUCT THE DATA SET #
#############################################################

A <- data.frame(x = 1:8, y = as.character(1:8))
B <- data.frame(x = c(4:1, 6), y = as.character(c(4:1, 6)))

#####################################################
# LET'S LOOK FIRST AT SOME EXAMPLES THAT DON'T WORK #
#####################################################
##############################
# only works on numeric data #
##############################
A[! data.frame(t(A)) %in% data.frame(t(B)), ]

##########################################################
# relies on rownames being the same which the may not be #
# as in this case : (                                    #
##########################################################
A[- as.integer(rownames(B)),]

################################################
# safe paste2 way (handles character/numeric & #
# no relieance on row names                    #
################################################
A[!paste2(A)%in%paste2(B), ]```

Use 2:

Recently I’ve been working with error bars on count data and inquired on talkstats.com  about used the paste2 function as a helper function to accomplish the task of finding standard errors for count data.  Here’s the function with paste2 in it.  It’s probably not necessary but makes life a lot easier.

```SEcount <- function(dat, confidence = .95, se.dig=3, intv.digs=2){
if (is.list(dat)) dat <- ftable(cbind(dat))
dat2 <- as.data.frame(dat)
len <- ncol(dat2)
if (any(names(dat2) %in% "group")) {
f <- names(dat2) %in% "group"
name <- paste0("orig.", names(dat2) [f])
names(dat2)[f] <- name
}
#HERE'S THE paste2 FUNCTION USE
dat2\$group <- factor(paste2(dat2[, 1:(len-1)], sep=":"))
dat2 <- dat2[order(dat2\$group), c(len + 1, 1:len)]
dat2 <- dat2[dat2\$Freq!=0 ,]
se <- summary(glm(Freq ~  group -1, data=dat2,
family=poisson))[["coefficients"]][, 2]
est <- summary(glm(Freq ~  group -1, data=dat2,
family=poisson))[["coefficients"]][, 1]
dat2\$SE <- round(summary(glm(Freq ~  group -1, data=dat2,
family=poisson))[["coefficients"]][, 2], digits=se.dig)
n.SE <- qnorm(1-(1-confidence)/2)
dat2\$minus_1_SE <- round(dat2\$Freq - se, digits=intv.digs)
dat2\$plus_1_SE <- round(dat2\$Freq + se, digits=intv.digs)
if (!is.null(confidence)){
dat2\$lower <- round(exp(est - n.SE*se), digits=intv.digs)
dat2\$upper <- round(exp(est + n.SE*se), digits=intv.digs)
}
rownames(dat2) <- 1:nrow(dat2)
return(dat2)
}
###############
# TEST IT OUT #
###############
SEcount(mtcars[, c("cyl", "gear", "carb")])
SEcount(mtcars[, 8:11])
x <- ftable(mtcars[, 8:10])
SEcount(x) #takes ftables too```

I hope you learned something about paste2 and may add it to your .Rprofile.  It’s not always the fastest choice but can be very convenient and easy to use.