Is there a %in% operator accros multiple columns
- by RobinLovelace
Imagine you have two data frames
df1 <- data.frame(V1 = c(1, 2, 3), v2 = c("a", "b", "c"))
df2 <- data.frame(V1 = c(1, 2, 2), v2 = c("b", "b", "c"))
Here's what they look like, side by side:
> cbind(df1, df2)
V1 v2 V1 v2
1 1 a 1 b
2 2 b 2 b
3 3 c 2 c
You want to know which observations are duplicates, across all variables.
This can be done by pasting the cols together and then using %in%:
df1Vec <- apply(df1, 1, paste, collapse= "")
df2Vec <- apply(df2, 1, paste, collapse= "")
df2Vec %in% df1Vec
[1] FALSE TRUE FALSE
The second observation is thus the only one in df2 and also in df1.
Is there no faster way of generating this output - something like %IN%, which is %in% across multiple variables, or should we just be content with the apply(paste) solution?