2017-08-29 42 views
-1

我有一个表是这样的:建设应急表

df <- data.frame(P1 = c(1,0,0,0,0,0,"A"), 
        P2 = c(0,-2,1,2,1,0,"A"), 
        P3 = c(-1,2,0,2,1,0,"B"), 
        P4 = c(2,0,-1,0,-1,0,"B"), 
        Names = c("G1","G2","G3","G1","G2","G3","Group"), 
        stringsAsFactors = FALSE) 

,这已经成为

Names P1 P2 P3 P4 
G1  1 0  -1 2 
G2  0 -2 2 0 
G3  0 1  0 -1 
G1  0 2  2 0 
G2  0 1  1 -1 
G3  0 0  0 0 
Group A A  B B 

这里,AB是分组变量P1, P2, P3, P4

我想建立Ids应急(G1G2 ...),GroupAB)和Var-2,-1,0,1,2)表,例如:

Id Group Var Count 
G1 A  -2  0 
G1 A  -1  0 
G1 A  0  1 
G1 A  1  1 
G1 A  2  0 
G1 B  -2  0 
G1 B  -1  1 
G1 B  0  0 
G1 B  1  0 
G1 B  2  1 
G2 A  -2  1 
G2 A  -1  0 
G2 A  0  1 
... 

有没有办法做到它在R中没有使用大量的循环?

+3

(HTTP【如何使一个伟大的[R重复的例子?]://计算器。 com/questions/5963269) – Sotos

+0

谢谢@索托斯,我加了df – Sosi

+2

我觉得你的输出与你的'df'不一致:不应该'组'是一个变量?它连续出现...... – mdag02

回答

1

假设你要组P1 & P2列作为AP3 & P4列作为B,你可以用data.table -package如下来解决:

library(data.table) 
DT <- melt(melt(setDT(df), 
       measure.vars = list(c(2,3),c(4,5)), 
       value.name = c("A","B")), 
      id = 1, measure.vars = 3:4, variable.name = 'group' 
      )[order(Id,group)][, val2 := value] 

DT[CJ(Id = Id, group = group, value = value, unique = TRUE) 
    , on = .(Id, group, value) 
    ][, .(counts = sum(!is.na(val2))), by = .(Id, group, value)] 

导致:

Id group value counts 
1: G1  A -2  0 
2: G1  A -1  0 
3: G1  A  0  2 
4: G1  A  1  1 
5: G1  A  2  1 
6: G1  B -2  0 
7: G1  B -1  1 
8: G1  B  0  1 
9: G1  B  1  0 
10: G1  B  2  2 
11: G2  A -2  1 
12: G2  A -1  0 
13: G2  A  0  2 
14: G2  A  1  1 
15: G2  A  2  0 
16: G2  B -2  0 
17: G2  B -1  1 
18: G2  B  0  1 
19: G2  B  1  1 
20: G2  B  2  1 
21: G3  A -2  0 
22: G3  A -1  0 
23: G3  A  0  3 
24: G3  A  1  1 
25: G3  A  2  0 
26: G3  B -2  0 
27: G3  B -1  1 
28: G3  B  0  3 
29: G3  B  1  0 
30: G3  B  2  0 

使用的数据

df <- read.table(text="Id  P1 P2 P3 P4 
G1  1 0 -1 2 
G2  0 -2 2  0 
G3  0 1 0  -1 
G1  0 2 2  0 
G2  0 1 1  -1 
G3  0 0 0  0", header=TRUE, stringsAsFactors = FALSE) 

注意,我省略了“Group'行,因为你的意见,这些都只是为了表示对群体P1其中指出 - P4列应属于。

+0

的确,非常感谢! – Sosi

1

随着

library(tidyverse) 

df <- read.table(text="Id  P1 P2 P3 P4 
G1  1 0 -1 2 
G2  0 -2 2  0 
G3  0 1 0  -1 
G1  0 2 2  0 
G2  0 1 1  -1 
G3  0 0 0  0", header=TRUE, stringsAsFactors = FALSE) 

我们重塑表和group重新编码P*变量。 然后我们计算并完成遗失的案例。导致:

df %>% 
    gather(P1, P2, P3, P4, key = "p", value = "v") %>% 
    mutate(group = ifelse(p %in% c("P1", "P2"), "A", "B")) %>% 
    group_by(Id, group, v) %>% 
    summarise(Count = n()) %>% 
    ungroup() %>% 
    complete(Id, group, v, fill = list("Count" = 0)) 

如果你不需要输出中的所有组合,只需使用:

df %>% 
    gather(P1, P2, P3, P4, key = "p", value = "v") %>% 
    mutate(group = ifelse(p %in% c("P1", "P2"), "A", "B")) %>% 
    group_by(Id, group, v) %>% 
    summarise(Count = n()) 

# A tibble: 17 x 4 
# Groups: Id, group [?] 
     Id group v  Count 
     <chr> <chr> <int> <int> 
1 G1  A  0  2 
2 G1  A  1  1 
3 G1  A  2  1 
4 G1  B -1  1 
5 G1  B  0  1 
6 G1  B  2  2 
7 G2  A -2  1 
8 G2  A  0  2 
9 G2  A  1  1 
10 G2  B -1  1 
11 G2  B  0  1 
12 G2  B  1  1 
13 G2  B  2  1 
14 G3  A  0  3 
15 G3  A  1  1 
16 G3  B -1  1 
17 G3  B  0  3