paste, paste0, and sprintf

I find myself pasting urls and lots of little pieces together lately. Now paste is a standard go to guy when you wanna glue some stuff together. But often I find myself pasting and getting stuff like this:

paste(LETTERS)
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q"
[18] "R" "S" "T" "U" "V" "W" "X" "Y" "Z"

Rather than the desired…

[1] "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

When I get into those situations I think, “Oh better use collapse instead”; but never really think before using paste (That is whether I collapse or sep and why). This is inefficient and causes me to lack the time to write quality articles for Fox News (JK for those taking me serious). This tutorial will give some basic and clear direction about the following functions:

paste(x)
paste0(x)
sprintf(x, y)

paste

paste has 3 arguments.

paste (..., sep = " ", collapse = NULL)

The ... is the stuff you want to paste together and sep and collapse are the guys to get it done. There are three basic things I paste together:

  1. A bunch of individual character strings.
  2. 2 or more vectors pasted element for element.
  3. One vector smushed together.

Here's an example of each, though not with the correct arguments (I'm building suspense here):

paste("A", 1, "%")       #A bunch of individual character strings.
paste(1:4, letters[1:4]) #2 or more vectors pasted element for element.
paste(1:10)              #One vector smushed together.

Here's the sep/collapse rule for each:

  1. A bunch of individual character strings – You want sep
  2. 2 or more vectors pasted element for element. – You want sep
  3. One vector smushed together.- Smushin requires collapse

So here they are with the correct arguments:

paste("A", 1, "%")       #A bunch of individual character strings.
paste(1:4, letters[1:4]) #2 or more vectors pasted element for element.
paste(1:10, collapse="") #One vector smushed together.

This yields:

> paste("A", 1, "%")       #A bunch of individual character strings.
[1] "A 1 %"
> paste(1:4, letters[1:4]) #2 or more vectors pasted element for element.
[1] "1 a" "2 b" "3 c" "4 d"
> paste(1:10, collapse="") #One vector smushed together.
[1] "12345678910"

paste0

paste0 is short for:

paste(x, sep="")

So it allows us to be lazier and more efficient. I'm lazy so I use paste0 a lot.

paste0("a", "b") == paste("a", "b", sep="")
## [1] TRUE

'nuff said.


sprintf

I discovered this guy a while back but realized it's value in pasting recently. Much of my work on the reports (Rinker, 2013) package requires that I piece together lots of chunks of url and insert user specific pieces. This can be a nightmare with all the quotation marks. A typical take may look like this:

person <-"Grover"
action <-"flying"
message(paste0("On ", Sys.Date(), " I realized ", person, " was...\n", action, " by the street"))
## On 2013-09-14 I realized Grover was... flying by the street

No joke it took me 6 tries before I formatted that without an error (missing quotes, spaces, and commas).

But we can use sprintf to make one string (less commas + less quotations marks = less errors) and feed the elements that may differ from user to user or time to time. Let's look at an example to see what I mean:

person <-"Grover"
action <-"flying"
message(sprintf("On %s I realized %s was...\n%s by the street", Sys.Date(), person, action))
## On 2013-09-14 I realized Grover was... flying by the street

Boom first time. It's easy to figure out the spacing and there aren't the commas and quotation marks to deal with. Just use the %s marker to denote that some element goes here and then feed it in as a vector after the character string. For some applications sprintf is a superior choice over paste/paste0.


Note that these are not extensive, all-encompassing rules but guides for general use. Also be aware the sprintf is even cooler than I demonstrated here.

*Created using the reports package


References

Advertisement

About tylerrinker

Data Scientist, open-source developer , #rstats enthusiast, #dataviz geek, and #nlp buff
This entry was posted in paste, Uncategorized and tagged , , , , , . Bookmark the permalink.

20 Responses to paste, paste0, and sprintf

  1. MasterG says:

    why do we need the escape – \n ?

  2. beckmw says:

    Nice post, thanks for sharing! I just discovered the laziness that paste0 allows, but sometimes have issues with R recognizing the function. I run R on multiple comps with different versions (a bad practice, yes)… Is it only available on later versions of R?

  3. Dave says:

    what a legendary post. paste drives me nuts.

    the only remaining thing is how to format numbers to be:
    %s, or
    to 1 dp, or
    to 2dp with a $/£ sign out front and a “m” at the end.

    Indeed, I’d pay a dollar for that post too.

    Great work

  4. Muthukumar says:

    Can we add column to csv file using this paste0 function.

  5. Muthukumar says:

    Can I use Paste0() function to add the column in csv file ?

  6. cwarth says:

    The sprintf function in R is badly broken when it comes to errors, e.g.

    # R version 3.2.2 (2015-08-14)
    > v=NULL
    > sprintf(“v = ‘%s’\n”, v)
    character(0)
    > paste0(“v = ‘”, v, “‘\n”)
    [1] “v = ”\n”

    sprintf returns character(0). Not an error, not ‘v=’, just an empty string with no indication that something is wrong. As hideous as the paste functions are, they are least return something sensible in all cases.

    If you use sprintf you will silently lose data and be completely unaware that it has happened.

    • tylerrinker says:

      I wouldn’t use the word broken, it behaves differently than you might want or expect it to. It is definitely easier to work with over pasting bits. paste is certainly viable if NULLs etc. are concerning or you can do error checking yourself at the end or beginning with is.null or length(paste0(“v = ‘”, v, “‘\n”)) > 0. But this is a point of preference. One more serious note is that sprintf may behave differently on different machines (i.e. mac may produce a differently padded string than windows.

  7. Tobias says:

    I think the default value for sep frequently drives people crazy. I was not expecting an empty space there.

  8. Gor says:

    paste(1:4, letters[1:4])
    I will get:
    1 a
    2 b
    3 c
    4 d
    How to write in order to get the following list:
    1 a
    1 b
    1 c
    1 d
    2 a
    2 b

    So all the combinations of the provided numbers and letters.

    Thanks

  9. Pingback: Math Notation for R Plot Titles: expression and bquote | TRinker's R Blog

  10. Thanks a lot; from everyone of us.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s