2017-12-18 111 views
1

我有一个数据帧寻找这样的:如何跨越制表(XTABS)多瓦尔,但相同的击穿

SubjectID Activity  V1   V2   V3 
1   2  S 0.2571778 -0.02328523 -0.01465376 
2   2  W 0.2860267 -0.01316336 -0.11908252 
3   3  R 0.2754848 -0.02605042 -0.11815167 
4   3  W 0.2702982 -0.03261387 -0.11752018 
5   4  A 0.2748330 -0.02784779 -0.12952716 
6   4  S 0.2792199 -0.01862040 -0.11390197 
... 

(其实有更多的Vn的变数,但这说明了这个问题。)

我想用xtabs()看所有Vn的增值经销商,但保持SubjectID和活动不断 - 像

xtabs(c(V1, V2, V3) ~ SubjectID + Activity, data = DF) 

lapply(c(V1, V2, V3), function(x) xtabs(x ~ SubjectID + Activity, data = DF)) 

但当然这些不起作用。什么是正确的方法在这里?


编辑:我想是的

xtabs(V1 ~ SubjectID + Activty, data = DF) 
xtabs(V2 ~ SubjectID + Activty, data = DF) 
xtabs(V3 ~ SubjectID + Activty, data = DF) 
... 
+1

一种方法是使用'reshape'而不是'xtabs','lappl y(paste0(“V”,1:3),function(x) reshape(df [c(x,“SubjectID”,“Activity”)],idvar =“SubjectID”,timevar =“Activity” “宽”)) ' –

+0

@RonakShah这是伟大的,除非它没有总结价值作为xtabs会(我真的想找到平均值,但如果我能得到它总结我可以外推的意思) – Conrad

回答

1

输出你应该能够只使用get提供感兴趣的列的特征向量后。

lapply(c("V1", "V2", "V3"), function(x) xtabs(get(x) ~ SubjectID + Activity, data = DF)) 

与 “airquality” 数据集试试看:

setNames(lapply(names(airquality)[1:4], 
       function(x) xtabs(get(x) ~ Month + Day, airquality)), 
     names(airquality)[1:4]) 

根据您的意见,我建议你看一下使用 “data.table” 和dcast如果荷兰国际集团你需要一个宽泛的数据集。

下面是一个例子:

set.seed(1) 
DF <- cbind(warpbreaks, V2 = sample(100, nrow(warpbreaks)), V3 = sample(100, nrow(warpbreaks))) 
library(data.table) 
setDT(DF) 
lapply(c("breaks", "V2", "V3"), function(x) { 
    dcast(DF[, lapply(.SD, mean), .(wool, tension)], wool ~ tension, value.var = x) 
}) 
# [[1]] 
# wool  L  M  H 
# 1: A 44.55556 24.00000 24.55556 
# 2: B 28.22222 28.77778 18.77778 
# 
# [[2]] 
# wool  L  M  H 
# 1: A 59.22222 46.33333 33.22222 
# 2: B 49.44444 44.77778 43.22222 
# 
# [[3]] 
# wool L  M  H 
# 1: A 40 68.11111 74.22222 
# 2: B 48 40.11111 37.77778 

或者,你可以有一个完全宽 “data.table”,像这样:

dcast(DF[, lapply(.SD, mean), .(wool, tension)], wool ~ tension, 
     value.var = c("breaks", "V2", "V3")) 
# wool breaks_L breaks_M breaks_H  V2_L  V2_M  V2_H V3_L  V3_M  V3_H 
# 1: A 44.55556 24.00000 24.55556 59.22222 46.33333 33.22222 40 68.11111 74.22222 
# 2: B 28.22222 28.77778 18.77778 49.44444 44.77778 43.22222 48 40.11111 37.77778 
1

使用整洁的做法,这是怎么了我会解决这个问题:

library(tidyr) 
library(dplyr) 
library(purrr) 

df <- tribble(
    ~SubjectID, ~Activity,  ~V1,   ~V2,   ~V3, 
      2,  "S", 0.2571778, -0.02328523, -0.01465376, 
      2,  "W", 0.2860267, -0.01316336, -0.11908252, 
      3,  "R", 0.2754848, -0.02605042, -0.11815167, 
      3,  "W", 0.2702982, -0.03261387, -0.11752018, 
      4,  "A", 0.2748330, -0.02784779, -0.12952716, 
      4,  "S", 0.2792199, -0.01862040, -0.11390197 
) 

df %>% 
    select(starts_with("V")) %>% 
    map(~{ 
    as_tibble(xtabs(.x ~ SubjectID + Activity, data = df)) 
    }) %>% 
    bind_rows(.id = "var") %>% 
    spread(Activity, n) 

# # A tibble: 9 x 6 
#  var SubjectID   A   R   S   W 
# * <chr>  <chr>  <dbl>  <dbl>  <dbl>  <dbl> 
# 1 V1   2 0.00000000 0.00000000 0.25717780 0.28602670 
# 2 V1   3 0.00000000 0.27548480 0.00000000 0.27029820 
# 3 V1   4 0.27483300 0.00000000 0.27921990 0.00000000 
# 4 V2   2 0.00000000 0.00000000 -0.02328523 -0.01316336 
# 5 V2   3 0.00000000 -0.02605042 0.00000000 -0.03261387 
# 6 V2   4 -0.02784779 0.00000000 -0.01862040 0.00000000 
# 7 V3   2 0.00000000 0.00000000 -0.01465376 -0.11908252 
# 8 V3   3 0.00000000 -0.11815167 0.00000000 -0.11752018 
# 9 V3   4 -0.12952716 0.00000000 -0.11390197 0.00000000