如何执行基于行的分组

我有一个做了如下方式的数据帧两两部门：如何执行基于行的分组

df <- structure(list(celltype = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 
4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L), .Label = c("Bcells", 
"DendriticCells", "Macrophages", "Monocytes", "NKCells", "Neutrophils", 
"StemCells", "StromalCells", "abTcells", "gdTCells"), class = "factor"), 
    sample = c("SP ID control", "SP ID treated", "SP ID control", 
    "SP ID treated", "SP ID control", "SP ID treated", "SP ID control", 
    "SP ID treated", "SP ID control", "SP ID treated", "SP ID control", 
    "SP ID treated", "SP ID control", "SP ID treated", "SP ID control", 
    "SP ID treated", "SP ID control", "SP ID treated", "SP ID control", 
    "SP ID treated"), `mean(score)` = c(0.160953535029424, 0.155743474395545, 
    0.104788051104575, 0.125247035158472, -0.159665650045289, 
    -0.134662049979712, 0.196249441751866, 0.212256889027029, 
    0.0532668251890109, 0.0738264693971133, 0.151828478029596, 
    0.159941552142933, -0.14128323638966, -0.120556640790534, 
    0.196518649474078, 0.185264282171863, 0.0654641151966543, 
    0.0837989059507186, 0.145111577618456, 0.145448549866796)), .Names = c("celltype", 
"sample", "mean(score)"), row.names = c(7L, 8L, 17L, 18L, 27L, 
28L, 37L, 38L, 47L, 48L, 57L, 58L, 67L, 68L, 77L, 78L, 87L, 88L, 
97L, 98L), class = "data.frame")

它看起来像这样：

> df 
     celltype  sample mean(score) 
7   Bcells SP ID control 0.16095354 
8   Bcells SP ID treated 0.15574347 
17 DendriticCells SP ID control 0.10478805 
18 DendriticCells SP ID treated 0.12524704 
27 Macrophages SP ID control -0.15966565 
28 Macrophages SP ID treated -0.13466205 
37  Monocytes SP ID control 0.19624944 
38  Monocytes SP ID treated 0.21225689 
47  NKCells SP ID control 0.05326683 
48  NKCells SP ID treated 0.07382647 
57 Neutrophils SP ID control 0.15182848 
58 Neutrophils SP ID treated 0.15994155 
67  StemCells SP ID control -0.14128324 
68  StemCells SP ID treated -0.12055664 
77 StromalCells SP ID control 0.19651865 
78 StromalCells SP ID treated 0.18526428 
87  abTcells SP ID control 0.06546412 
88  abTcells SP ID treated 0.08379891 
97  gdTCells SP ID control 0.14511158 
98  gdTCells SP ID treated 0.14544855

我想要做的是根据cell type分组内的treated和control样本计算得分分数。

下面的Excel图像说明了这个例子。我们在最右栏之后。例如在Bcells（0.155/0.161 = 0.967）。

在这一天结束时，我想获得的是看起来像这样的DF：

celltype   sample   Pairwise division 
Bcells    SP ID treated 0.967630031 
DendriticCells  SP ID treated 1.195241574 
Macrophages   SP ID treated 0.843400255 
Monocytes   SP ID treated 1.081566841 
NKCells    SP ID treated 1.385974647 
Neutrophils   SP ID treated 1.053435786 
StemCells   SP ID treated 0.853297563 
StromalCells  SP ID treated 0.942731303 
abTcells   SP ID treated 1.280073915 
gdTCells   SP ID treated 1.002322158

我如何能实现在R里面？

来源

2016-08-24 neversaint

如果蔓延到广泛的形式，这是很简单的：

library(tidyr) 
library(dplyr) 

df %>% spread(sample, `mean(score)`) %>% 
    mutate(pairwise_division = `SP ID treated`/`SP ID control`) 

##   celltype SP ID control SP ID treated pairwise_division 
## 1   Bcells 0.16095354 0.15574347   0.9676300 
## 2 DendriticCells 0.10478805 0.12524704   1.1952416 
## 3  Macrophages -0.15966565 -0.13466205   0.8434003 
## 4  Monocytes 0.19624944 0.21225689   1.0815668 
## 5   NKCells 0.05326683 0.07382647   1.3859746 
## 6  Neutrophils 0.15182848 0.15994155   1.0534358 
## 7  StemCells -0.14128324 -0.12055664   0.8532976 
## 8 StromalCells 0.19651865 0.18526428   0.9427313 
## 9  abTcells 0.06546412 0.08379891   1.2800739 
## 10  gdTCells 0.14511158 0.14544855   1.0023222

请注意，你应该解决您的列名，这样你就不必经常使用反引号。

要获得精确的期望的结果，收集回长，过滤器，只处理行，并选择所需的列：

df %>% spread(sample, `mean(score)`) %>% 
    mutate(pairwise_division = `SP ID treated`/`SP ID control`) %>% 
    gather(sample, `mean(score)`, starts_with('SP')) %>% 
    filter(sample == 'SP ID treated') %>% 
    select(celltype, sample, pairwise_division) 

##   celltype  sample pairwise_division 
## 1   Bcells SP ID treated   0.9676300 
## 2 DendriticCells SP ID treated   1.1952416 
## 3  Macrophages SP ID treated   0.8434003 
## 4  Monocytes SP ID treated   1.0815668 
## 5   NKCells SP ID treated   1.3859746 
## 6  Neutrophils SP ID treated   1.0534358 
## 7  StemCells SP ID treated   0.8532976 
## 8 StromalCells SP ID treated   0.9427313 
## 9  abTcells SP ID treated   1.2800739 
## 10  gdTCells SP ID treated   1.0023222

等效版本是在基地可能与data.table，如果你喜欢。或采取直接的路线：

aggregate(cbind(pairwise_division = `mean(score)`) ~ celltype, 
      df[order(df$celltype, df$sample), ], 
      FUN = function(x){x[2]/x[1]}) 

##   celltype pairwise_division 
## 1   Bcells   0.9676300 
## 2 DendriticCells   1.1952416 
## 3  Macrophages   0.8434003 
## 4  Monocytes   1.0815668 
## 5   NKCells   1.3859746 
## 6  Neutrophils   1.0534358 
## 7  StemCells   0.8532976 
## 8 StromalCells   0.9427313 
## 9  abTcells   1.2800739 
## 10  gdTCells   1.0023222

来源

2016-08-24 01:12:45 alistaire

谢谢，但怎么来的值不是你的结果的第一行'0.967630031'？ – neversaint

糟糕，向后分开并贴出错误的版本。固定。 – alistaire

如果您的数据是有序和完全配对：

pair_index <- 1:(dim(df)[1]/2)*2 
df[pair_index,'pairwise-division'] <- df[pair_index,3]/df[pair_index-1,3] 
df[pair_index,c(1,2,4)]

来源

2016-08-24 01:21:12 HubertL

如何执行基于行的分组

回答

相关问题