自动计算汇总统计的数据帧和创建新表

我有以下数据框：自动计算汇总统计的数据帧和创建新表

col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov", 
      "chi","avi","chi","chi","bov","bov","fox","avi","bov","chi") 
col2 <- c("low","med","high","high","low","low","med","med","med","high", 
      "low","low","high","high","med","med","low","low","med") 
col3 <- c(0,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0) 

test_data <- cbind(col1, col2, col3) 
test_data <- as.data.frame(test_data)

，我想是这样的表落得（值是随机的）：

Species Pop.density %Resistance CI_low CI_high Total samples 
avi  low   2.0   1.2  2.2  30 
avi  med   0   0  0.5  20 
avi  high   3.5   2.9  4.2  10 
chi  low   0.5   0.3  0.7  20 
chi  med   2.0   1.9  2.1  150 
chi  high   6.5   6.2  6.6  175

％阻力栏基于上面的col3，其中1 =耐，0 =不耐。我曾尝试以下：

library(dplyr) 
test_data<-test_data %>% 
    count(col1,col2,col3) %>% 
    group_by(col1, col2) %>% 
    mutate(perc_res = prop.table(n)*100)

我想这一点，它似乎几乎做的伎俩，因为我得到的总的1和0的COL3的百分比，在col1和2每一个值，但总样本是错误的，因为我指望所有的三列，当正确的计数将是唯一的col1和2

对于置信区间我会用以下内容：

binom.test(resistant samples,total samples)$conf.int*100

但是我不知道怎么样与其他人一起实施。有没有简单快捷的方法来做到这一点？

来源

2017-09-15 Haakonkas

我建议使用group_by然后使用汇总功能。 – Jul

使用'data.frame（col1，col2，col3）'，而不是'cbind'，这会迫使每列在这里串起来。 – Frank

您的示例数据没有（“avi”，“high”）对。您是否希望该行反正出现（使用NAs和零采样数）？ – Frank

这应该这样做。

library(tidyverse) 
library(broom) 

test_data %>% 
    mutate(col3 = ifelse(col3 == 0, "NonResistant", "Resistant")) %>% 
    count(col1, col2, col3) %>% 
    spread(col3, n, fill = 0) %>% 
    mutate(PercentResistant = Resistant/(NonResistant + Resistant)) %>% 
    mutate(test = map2(Resistant, NonResistant, ~ binom.test(.x, .x + .y) %>% tidy())) %>% 
    unnest() %>% 
    transmute(Species = col1, Pop.density = col2, PercentResistant, CI_low = conf.low * 100, CI_high = conf.high * 100, TotalSamples = Resistant + NonResistant)

变异的0/1阻力列，以便它有可读价值。
计算每个存储桶中的值。
将col3/n分为两列，即Resistant/NonResistant，并将计数（n）放入这些列中。现在每行都有它做测试所需的一切。
计算百分比阻力
对每个存储桶执行测试并将结果放入一个名为test的嵌套帧中。
Unnest测试数据框，以便您可以使用测试结果。
清理，给一切美好的名字。

结果

来源

2017-09-15 15:02:55

伟大的解决方案！我可以问'conf.low'来自'unnest（）'后面我只看到'估计值'和'统计量'吗？ – PoGibas

@PoGibas：conf.low来自'tidy（）'，然后'unid'。如果你看到估计它应该在那里。疯狂的猜测，你的窗口不是那么宽，在结果下面有“... X多变量”的东西？ –

aaa，它是'tbl_df'，'％>％data.frame（）'显示它。不能习惯'tibble' – PoGibas

我会做...

library(data.table) 
setDT(DT) 

DT[, { 
    bt <- binom.test(sum(resists), .N)$conf.int*100 
    .(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N) 
}, keyby=.(species, popdens)] 

    species popdens res_rate res_lo res_hi n 
1:  avi  low 0.00000 0.000000 70.75982 3 
2:  avi  med 0.00000 0.000000 97.50000 1 
3:  bov  low 100.00000 15.811388 100.00000 2 
4:  bov  med 50.00000 1.257912 98.74209 2 
5:  bov high 100.00000 15.811388 100.00000 2 
6:  chi  low 0.00000 0.000000 97.50000 1 
7:  chi  med 50.00000 1.257912 98.74209 2 
8:  chi high 66.66667 9.429932 99.15962 3 
9:  fox  low 0.00000 0.000000 97.50000 1 
10:  fox  med 50.00000 1.257912 98.74209 2

要包括所有级别（物种和种群密度的组合）...

DT[CJ(species = species, popdens = popdens, unique = TRUE), on=.(species, popdens), { 
    bt <- 
    if (.N > 0L) binom.test(sum(resists), .N)$conf.int*100 
    else NA_real_ 
    .(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)  
}, by=.EACHI] 

    species popdens res_rate res_lo res_hi n 
1:  avi  low 0.00000 0.000000 70.75982 3 
2:  avi  med 0.00000 0.000000 97.50000 1 
3:  avi high  NA  NA  NA 0 
4:  bov  low 100.00000 15.811388 100.00000 2 
5:  bov  med 50.00000 1.257912 98.74209 2 
6:  bov high 100.00000 15.811388 100.00000 2 
7:  chi  low 0.00000 0.000000 97.50000 1 
8:  chi  med 50.00000 1.257912 98.74209 2 
9:  chi high 66.66667 9.429932 99.15962 3 
10:  fox  low 0.00000 0.000000 97.50000 1 
11:  fox  med 50.00000 1.257912 98.74209 2 
12:  fox high  NA  NA  NA 0

它如何w兽人

语法DT[i, j, by=]其中...

i确定行的子集，有时用辅助参数，on=或roll=。
by=确定子集表内的组，如果排序则切换到keyby=。
j是代码在每个组上的代码。

j应该评估的名单，.()是一个快捷方式list()。有关详细信息，请参阅?data.table。

数据使用

（改名为列，重新格式化二元变量回0/1或假/真，集人口密度按照正确的顺序水平）：

DT = data.frame(
    species = col1, 
    popdens = factor(col2, levels=c("low", "med", "high")), 
    resists = col3 
)

来源

2017-09-15 15:33:56 Frank

自动计算汇总统计的数据帧和创建新表

回答

相关问题