2016-03-02 64 views
0

我想从长格式数据框中的所有标记项中减去具有标签“baseline”的行中的值。使用带“基线”子集的left_join分两步完成此操作非常简单。然而,我不知道如何将vas_1vas_diff合并成一个链。dplyr自动加入过滤器

library(dplyr) 
# Create test data 
n_users = 5 
vas = data_frame(
    user = rep(letters[1:n_users], each = 3), 
    group = rep(c("baseline", "early", "late"),n_users), 
    vas = round(rgamma(n_users*3, 10,1.4)) 
) 
# The above data are given 


# Assume some other operations are required 
vas_1 = vas %>% 
    mutate(
    vas = vas * 2 
) 
# I want to put the following into one 
# chain with the above 
# Use self-join to subtract baseline 
vas_diff = vas_1 %>% 
    filter(group != "baseline") %>% 
    # Problem is vas_1 here. Using . gives error here 
    # Adding copy = TRUE does not help 
# left_join(. %>% filter(group == "baseline") , by = c("user")) %>% 
    left_join(vas_1 %>% filter(group == "baseline") , by = c("user")) %>% 
    mutate(vas = vas.x - vas.y) %>% # compute offset 
    select(user, group.x, vas) # remove temporary variables 

vas_diff 
+0

那么期望的结果是什么?你的代码有效。你只是想简化它,或者你想要一个不同的结果? –

+0

不,结果很好。我对dplyr的内部工作感兴趣,特别是点的魔力。并试图了解错误消息,请求复制,甚至与“复制”失败 –

+1

不太确定我理解你的问题,但问题似乎是,你过滤掉所有观察'组==“基线”,然后,稍后在管道中,您想要再次访问这些行,但它们不再处于数据中。也许你想要像'vas_1%>%left_join(filter(。,group!=“baseline”),filter(。,group ==“baseline”),by = c(“user”))''? –

回答

1

我用匿名函数时.应多次使用:

... %>% (function(df) { ... }) %>% ... 

因此,你的情况:

vas_diff = vas_1 %>% 
    filter(group != "baseline") %>% 
    (function(df) left_join(df, df %>% filter(group == "baseline") , by = c("user"))) %>% 
    mutate(vas = vas.x - vas.y) %>% # compute offset 
    select(user, group.x, vas) 

(这是不会产生预期的结果描述以上评论,但它显示了如何使用匿名功能)

但可能你想这样:

vas_diff = vas_1 %>% 
    left_join(
    x = filter(., group != "baseline") 
    , y = filter(., group == "baseline") 
    , by = c("user") 
) %>% 
    mutate(vas = vas.x - vas.y) %>% # compute offset 
    select(user, group.x, vas) # remove temporary variables