Sampling Duplicates

Posted by user3640982 on Stack Overflow See other posts from Stack Overflow or by user3640982
Published on 2014-06-09T15:18:15Z Indexed on 2014/06/09 15:25 UTC
Read the original article Hit count: 305

Filed under:

sampling

I have a dataset from which I need to sample. It is set up with an ID field and a year field. I want every record from the most current year and then I want the most current ID's but sampled from every 3rd year going back. The data is ordered by year.

For example

ID<-rep(1:3, 5)
Year<-rep(c(1,2,3,4,5),each=3)
df<-data.frame(ID,Year)

   ID Year
1   1    1
2   2    1
3   3    1
4   1    2
5   2    2
6   3    2
7   1    3
8   2    3
9   3    3
10  1    4
11  2    4
12  3    4
13  1    5
14  2    5
15  3    5

So from this example, I would want to return

I'm thinking that some combination of duplicated() and which() should get what I want, but the problem is duplicated() just tells if it has been repeated; it doesn't say which record is being repeated.

 which(duplicated(df$ID))
[1]  4  5  6  7  8  9 10 11 12 13 14 15

This a problem since not every ID exists in every year.

Any help would be appreciated.

Thanks, Eric

Related posts about sampling

reservoir sampling problem: correctness of proof

as seen on Stack Overflow - Search for 'Stack Overflow'
This MSDN article proves the correctness of Reservoir Sampling algorithm as follows: Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k. The probability i is replaced is the probability k+1st element is chosen multiplied by i being… >>> More
reservoir sampling problem

as seen on Stack Overflow - Search for 'Stack Overflow'
This MSDN article proves the correctness of Reservoir Sampling algorithm as follows: Base case is trivial. For the k+1st case, the probability a given element i with position <= k is in R is s/k. The probability i is replaced is the probability k+1st element is chosen multiplied by i being… >>> More
Reservoir sampling

as seen on Stack Overflow - Search for 'Stack Overflow'
to retrieve k random numbers from an array of undetermined size we use a technique called reservoir sampling. Can anybody briefly highlight how it happens with a sample code?? >>> More
To sample or not to sample...

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Ideally, we would know the exact answer to every question. How many people support presidential candidate A vs. B? How many people suffer from H1N1 in a given state? Does this batch of manufactured widgets have any defective parts? Knowing exact answers is expensive in terms of time and money and… >>> More
Ask Tom - "On Dynamic Sampling"

as seen on Oracle Technology Network - Search for 'Oracle Technology Network'
Our technologist samples dynamically, considers usage, and sets levels. >>> More

Developer IT