Is there a %in% operator accros multiple columns

Posted by RobinLovelace on Stack Overflow See other posts from Stack Overflow or by RobinLovelace
Published on 2014-05-31T14:26:52Z Indexed on 2014/05/31 21:27 UTC
Read the original article Hit count: 150

Filed under:
|
|

Imagine you have two data frames

df1 <- data.frame(V1 = c(1, 2, 3), v2 = c("a", "b", "c"))
df2 <- data.frame(V1 = c(1, 2, 2), v2 = c("b", "b", "c"))

Here's what they look like, side by side:

> cbind(df1, df2)
  V1 v2 V1 v2
1  1  a  1  b
2  2  b  2  b
3  3  c  2  c

You want to know which observations are duplicates, across all variables.

This can be done by pasting the cols together and then using %in%:

df1Vec <- apply(df1, 1, paste, collapse= "")
df2Vec <- apply(df2, 1, paste, collapse= "")
df2Vec %in% df1Vec
[1] FALSE  TRUE FALSE

The second observation is thus the only one in df2 and also in df1.

Is there no faster way of generating this output - something like %IN%, which is %in% across multiple variables, or should we just be content with the apply(paste) solution?

© Stack Overflow or respective owner

Related posts about r

    Related posts about unique