subset complete or balance dataset in r

Posted by SHRram on Stack Overflow See other posts from Stack Overflow or by SHRram
Published on 2012-12-19T17:02:22Z Indexed on 2012/12/19 17:03 UTC
Read the original article Hit count: 146

Filed under:
|

I have a dataset that unequal number of repetition. I want to subset a data by removing those entries that are incomplete (i.e. replication less than maximum). Just small example:

set.seed(123)
mydt <- data.frame (name= rep ( c("A", "B", "C", "D", "E"), c(1,2,4,4, 3)), 
                   var1 = rnorm (14, 3,1), var2 = rnorm (14, 4,1))
 mydt
       name     var1     var2
1     A 2.439524 3.444159
2     B 2.769823 5.786913
3     B 4.558708 4.497850
4     C 3.070508 2.033383
5     C 3.129288 4.701356
6     C 4.715065 3.527209
7     C 3.460916 2.932176
8     D 1.734939 3.782025
9     D 2.313147 2.973996
10    D 2.554338 3.271109
11    D 4.224082 3.374961
12    E 3.359814 2.313307
13    E 3.400771 4.837787
14    E 3.110683 4.153373

summary(mydt)

name       var1            var2      
 A:1   Min.   :1.735   Min.   :2.033  
 B:2   1st Qu.:2.608   1st Qu.:3.048  
 C:4   Median :3.120   Median :3.486  
 D:4   Mean   :3.203   Mean   :3.688  
 E:3   3rd Qu.:3.446   3rd Qu.:4.412  
       Max.   :4.715   Max.   :5.787 

I want to get rid of A, B, E from the data as they are incomplete. Thus expected output:

name     var1     var2
4     C 3.070508 2.033383
5     C 3.129288 4.701356
6     C 4.715065 3.527209
7     C 3.460916 2.932176
8     D 1.734939 3.782025
9     D 2.313147 2.973996
10    D 2.554338 3.271109
11    D 4.224082 3.374961

Please note the dataset is big, the following may not a option:

mydt[mydt$name == "C",]
mydt[mydt$name == "D", ]

© Stack Overflow or respective owner

Related posts about r

    Related posts about dataset