Sampling Duplicates
Posted
by
user3640982
on Stack Overflow
See other posts from Stack Overflow
or by user3640982
Published on 2014-06-09T15:18:15Z
Indexed on
2014/06/09
15:25 UTC
Read the original article
Hit count: 264
sampling
I have a dataset from which I need to sample. It is set up with an ID field and a year field. I want every record from the most current year and then I want the most current ID's but sampled from every 3rd year going back. The data is ordered by year.
For example
ID<-rep(1:3, 5)
Year<-rep(c(1,2,3,4,5),each=3)
df<-data.frame(ID,Year)
ID Year
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
10 1 4
11 2 4
12 3 4
13 1 5
14 2 5
15 3 5
So from this example, I would want to return
ID Year
1 1 1
2 2 1
3 3 1
4 1 4
5 2 4
6 3 4
I'm thinking that some combination of duplicated() and which() should get what I want, but the problem is duplicated() just tells if it has been repeated; it doesn't say which record is being repeated.
which(duplicated(df$ID))
[1] 4 5 6 7 8 9 10 11 12 13 14 15
This a problem since not every ID exists in every year.
Any help would be appreciated.
Thanks, Eric
© Stack Overflow or respective owner