subset complete or balance dataset in r
Posted
by
SHRram
on Stack Overflow
See other posts from Stack Overflow
or by SHRram
Published on 2012-12-19T17:02:22Z
Indexed on
2012/12/19
17:03 UTC
Read the original article
Hit count: 146
I have a dataset that unequal number of repetition. I want to subset a data by removing those entries that are incomplete (i.e. replication less than maximum). Just small example:
set.seed(123)
mydt <- data.frame (name= rep ( c("A", "B", "C", "D", "E"), c(1,2,4,4, 3)),
var1 = rnorm (14, 3,1), var2 = rnorm (14, 4,1))
mydt
name var1 var2
1 A 2.439524 3.444159
2 B 2.769823 5.786913
3 B 4.558708 4.497850
4 C 3.070508 2.033383
5 C 3.129288 4.701356
6 C 4.715065 3.527209
7 C 3.460916 2.932176
8 D 1.734939 3.782025
9 D 2.313147 2.973996
10 D 2.554338 3.271109
11 D 4.224082 3.374961
12 E 3.359814 2.313307
13 E 3.400771 4.837787
14 E 3.110683 4.153373
summary(mydt)
name var1 var2
A:1 Min. :1.735 Min. :2.033
B:2 1st Qu.:2.608 1st Qu.:3.048
C:4 Median :3.120 Median :3.486
D:4 Mean :3.203 Mean :3.688
E:3 3rd Qu.:3.446 3rd Qu.:4.412
Max. :4.715 Max. :5.787
I want to get rid of A, B, E from the data as they are incomplete. Thus expected output:
name var1 var2
4 C 3.070508 2.033383
5 C 3.129288 4.701356
6 C 4.715065 3.527209
7 C 3.460916 2.932176
8 D 1.734939 3.782025
9 D 2.313147 2.973996
10 D 2.554338 3.271109
11 D 4.224082 3.374961
Please note the dataset is big, the following may not a option:
mydt[mydt$name == "C",]
mydt[mydt$name == "D", ]
© Stack Overflow or respective owner