2011-09-01 262 views
2

我想实现的是从每个组中获得10%的样本(这是2个因素 - 新近度和频率类别的组合)。到目前为止,我已经考虑过包取样和功能地层()。这看起来很有前途,但我得到以下错误,并且很难理解错误消息以及错误或如何解决此问题。分层抽样 - 没有足够的观察

这里是我的代码:

> d[1:10,] 
     date id_email_op recency frequecy r_cat f_cat 
1 29.8.2011  19393  294  1  A  G 
2 29.8.2011  19394  230  4  A  D 
3 29.8.2011  19395  238  12  A  B 
4 29.8.2011  19396  294  1  A  G 
5 29.8.2011  19397  223  9  A  C 
6 29.8.2011  19398  185  7  A  C 
7 29.8.2011  19399  273  2  A  F 
8 29.8.2011  19400  16  4  C  D 
9 29.8.2011  19401  294  1  A  G 
10 29.8.2011  19402  3  5  F  C 
> table(d$f_cat,d$r_cat) 

     A  B  C  D  E  F 
    A 176 203 289 228 335 983 
    B 1044 966 1072 633 742 1398 
    C 6623 3606 3020 1339 1534 2509 
    D 4316 1790 1239 529 586 880 
    E 8431 2798 2005 767 817 1151 
    F 22140 5432 3937 1415 1361 1868 
    G 100373 18316 11872 3760 3453 4778 
> as.vector(table(d$f_cat,d$r_cat)) 
[1] 176 1044 6623 4316 8431 22140 100373 203 966 3606 1790 2798 5432 
[14] 18316 289 1072 3020 1239 2005 3937 11872 228 633 1339 529 767 
[27] 1415 3760 335 742 1534 586 817 1361 3453 983 1398 2509 880 
[40] 1151 1868 4778 
> s <- strata(d,c("f_cat","r_cat"),size=as.vector(ceiling(0.1 * table(d$f_cat,d$r_cat))), method="srswor") 
Error in strata(d, c("f_cat", "r_cat"), size = as.vector(table(d$f_cat, : 
    not enough obervations for the stratum 6 

我真的不能看到什么是地层6.什么是条件在后台的功能检查?我不确定我的尺寸参数设置是否正确。是的,我已经检查采样包:)

谢谢大家和

+1

只要我们消除了具有小数值的样本量的问题,您可以用size = as.vector(ceiling()来替换'size = as.vector(table(d $ f_cat,d $ r_cat))*。1' (0.1 * table(d $ f_cat,d $ r_cat)))'? – Iterator

+0

@Iterator这很有道理,谢谢。 –

回答

1

你总是可以做自己的文档:

stratified <- NULL 
for(x in 1:6) { 
    tmp1 <- sample(rownames(subset(d, r_cat == "A" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "A")*0.1)) 
    tmp2 <- sample(rownames(subset(d, r_cat == "B" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "B")*0.1)) 
    tmp3 <- sample(rownames(subset(d, r_cat == "C" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "C")*0.1)) 
    tmp4 <- sample(rownames(subset(d, r_cat == "D" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "D")*0.1)) 
    tmp5 <- sample(rownames(subset(d, r_cat == "E" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "E")*0.1)) 
    tmp6 <- sample(rownames(subset(d, r_cat == "F" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "F")*0.1)) 
    tmp7 <- sample(rownames(subset(d, r_cat == "G" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "G")*0.1)) 
    stratified <- c(stratified,tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7) 
} 

然后......

d[stratified,]将是你的分层样本。