2016-03-03 90 views
1

我有跑了几个回合,像这样一个实验的不同治疗方法的测量:从组成员组内差异

set.seed(1) 
df <- data.frame(treatment = rep(c('baseline', 'treatment 1', 'treatment 2'), 
           times=5), 
       round = rep(1:5, each=3), 
       measurement1 = rep(1:5, each=3) + rnorm(15), 
       measurement2 = rep(1:5, each=3) + rnorm(15)) 

df 

#  treatment round measurement1 measurement2 
# 1  baseline  1 0.3735462 0.9550664 
# 2 treatment 1  1 1.1836433 0.9838097 
# 3 treatment 2  1 0.1643714 1.9438362 
# 4  baseline  2 3.5952808 2.8212212 
# 5 treatment 1  2 2.3295078 2.5939013 
# 6 treatment 2  2 1.1795316 2.9189774 
# 7  baseline  3 3.4874291 3.7821363 
# 8 treatment 1  3 3.7383247 3.0745650 
# 9 treatment 2  3 3.5757814 1.0106483 
# 10 baseline  4 3.6946116 4.6198257 
# 11 treatment 1  4 5.5117812 3.9438713 
# 12 treatment 2  4 4.3898432 3.8442045 
# 13 baseline  5 4.3787594 3.5292476 
# 14 treatment 1  5 2.7853001 4.5218499 
# 15 treatment 2  5 6.1249309 5.4179416 

我想是一个data.frame包含两次测量的差异每个治疗与每轮的基线之间。也就是说,按round分组,我希望从两个测量中的每一个中减去baselinetreatment中的相应测量值。

如果存在但我会更喜欢dplyr解决方案,但会接受任何优雅的边界。

+1

非常友善的你接受'任何接近优雅的东西。“但你有什么尝试过自己? – mtoto

+4

'df [,3:4] - df [df $ treatment =='baseline',] [df $ round,3:4]' – rawr

回答

5

您可以使用mutate_each为:

mydf %>% 
    group_by(round) %>% 
    mutate_each(funs(. - .[treatment=="baseline"]), -treatment) %>% 
    filter(treatment!="baseline") 

这给:

Source: local data frame [10 x 4] 
Groups: round [5] 

    treatment round measurement1 measurement2 
     (fctr) (int)  (dbl)  (dbl) 
1 treatment1  1  1.558820 -0.6584485 
2 treatment2  1 -0.068677 1.3364462 
3 treatment1  2  1.769312 -0.2732490 
4 treatment2  2  0.801357 -1.4852449 
5 treatment1  3 -1.064394 -1.1513703 
6 treatment2  3  2.433222 -0.7939903 
7 treatment1  4  0.448744 0.1394982 
8 treatment2  4 -1.066922 -1.1410085 
9 treatment1  5  1.182761 -0.8311095 
10 treatment2  5  0.138005 0.2622119 

如果你想的差异添加到您的数据框(就像@akrun在他做dplyr/tidyr替代),你也可以这样做:

mydf %>% 
    group_by(round) %>% 
    mutate(diff1 = measurement1 - measurement1[treatment=="baseline"], 
     diff2 = measurement2 - measurement2[treatment=="baseline"]) %>% 
    filter(treatment!="baseline") 

其中给出:

Source: local data table [10 x 6] 

    treatment round measurement1 measurement2  diff1  diff2 
     (fctr) (int)  (dbl)  (dbl)  (dbl)  (dbl) 
1 treatment1  1  2.630392 -0.104258 1.558820 -0.6584485 
2 treatment2  1  1.002895  1.890637 -0.068677 1.3364462 
3 treatment1  2  3.822473  3.147443 1.769312 -0.2732490 
4 treatment2  2  2.854518  1.935447 0.801357 -1.4852449 
5 treatment1  3  1.520553  3.291122 -1.064394 -1.1513703 
6 treatment2  3  5.018169  3.648502 2.433222 -0.7939903 
7 treatment1  4  4.956380  4.544908 0.448744 0.1394982 
8 treatment2  4  3.440714  3.264401 -1.066922 -1.1410085 
9 treatment1  5  4.672056  5.082310 1.182761 -0.8311095 
10 treatment2  5  3.627300  6.175631 0.138005 0.2622119 
3

我们可以使用data.table

library(data.table) 
setDT(df)[order(round,treatment), tail(.SD,2)- head(.SD,1)[rep(1,2)], 
       round , .SDcols=3:4] 

或者与data.table另一个选择是

setDT(df)[, lapply(.SD[, grep("^measurement", names(.SD)), 
    with =FALSE], function(x) x[treatment!="baseline"]- 
     x[treatment=="baseline"]) , round] 

或者用dplyr/tidyr

library(dplyr) 
library(tidyr) 
gather(df, var, val, measurement1:measurement2) %>% 
      spread(treatment, val) %>% 
      mutate(diff1 = `treatment 1` - baseline, 
       diff2 = `treatment 2` - baseline)