What makes these two R data frames not identical?
Posted
by Matt Parker
on Stack Overflow
See other posts from Stack Overflow
or by Matt Parker
Published on 2010-04-22T00:53:54Z
Indexed on
2010/04/22
1:03 UTC
Read the original article
Hit count: 460
UPDATE: I remembered dput() about the time Sharpie mentioned it. It's probably the row names. Back in a moment with an answer.
I have two small data frames, this_tx
and last_tx
. They are, in every way that I can tell, completely identical. this_tx
== last_tx
results in a frame of identical dimensions, all TRUE
. this_tx %in% last_tx
, two TRUEs
. Inspected visually, clearly identical. But when I call
identical(this_tx, last_tx)
I get a FALSE
. Hilariously, even
identical(str(this_tx), str(last_tx))
will return a TRUE
. If I set this_tx <- last_tx
, I'll get a TRUE
.
What is going on? I don't have the deepest understanding of R's internal mechanics, but I can't find a single difference between the two data frames. If it's relevant, the two variables in the frames are both factors - same levels, same numeric coding for the levels, both just subsets of the same original data frame. Converting them to character vectors doesn't help.
Background (because I wouldn't mind help on this, either): I have records of drug treatments given to patients. Each treatment record essentially specifies a person and a date. A second table has a record for each drug and dose given during a particular treatment (usually, a few drugs are given each treatment). I'm trying to identify contiguous periods during which the person was taking the same combinations of drugs at the same doses.
The best plan I've come up with is to check the treatments chronologically. If the combination of drugs and doses for treatment[i] is identical to the combination at treatment[i-1], then treatment[i] is a part of the same phase as treatment[i-1]. Of course, if I can't compare drug/dose combinations, that's right out.
© Stack Overflow or respective owner