2017-02-19 106 views
0

鉴于这一数据选择多列变异新列:dplyr:基于由变量字符串

df=data.frame(
    x1=c(2,0,0,NA,0,1,1,NA,0,1), 
    x2=c(3,2,NA,5,3,2,NA,NA,4,5), 
    x3=c(0,1,0,1,3,0,NA,NA,0,1), 
    x4=c(1,0,NA,3,0,0,NA,0,0,1), 
    x5=c(1,1,NA,1,3,4,NA,3,3,1)) 

我想创建一个使用dplyr选定列的横行最小值一个额外的列min。这很容易使用的列名:

df <- df %>% rowwise() %>% mutate(min = min(x2,x5)) 

但我有一个大的DF具有不同的列名,所以我需要从价值观mycols的一些字符串匹配。现在其他线程告诉我使用选择帮助函数,但我必须缺少一些东西。下面是matches

mycols <- c("x2","x5") 
df <- df %>% rowwise() %>% 
    mutate(min = min(select(matches(mycols)))) 
Error: is.string(match) is not TRUE 

而且one_of

mycols <- c("x2","x5") 
df <- df %>% 
rowwise() %>% 
mutate(min = min(select(one_of(mycols)))) 
Error: no applicable method for 'select' applied to an object of class "c('integer', 'numeric')" 
In addition: Warning message: 
In one_of(c("x2", "x5")) : Unknown variables: `x2`, `x5` 

我是什么俯瞰? select_应该工作吗?它不会在以下几点:

df <- df %>% 
    rowwise() %>% 
    mutate(min = min(select_(mycols))) 
Error: no applicable method for 'select_' applied to an object of class "character" 

而且同样:

df <- df %>% 
    rowwise() %>% 
    mutate(min = min(select_(matches(mycols)))) 
Error: is.string(match) is not TRUE 
+0

您需要使用dplyr动词的SE版本当使用字符串。在这种情况下,使用'select _()' –

+0

不能正常工作,因为我预计它可以工作:'df <- df %>% rowwise()%>% mutate(min = min(select_(mycols)))'yield“Error :没有将'select_'应用于类“字符”类的对象的适用方法“ – strangeloop

+0

由于它将字符串(正则表达式)作为参数而不是字符串向量,因此会出现'matches'错误。 – cderv

回答

1

这是一个有点棘手。在SE评估的情况下,您需要将该操作作为字符串传递。

mycols <- '(x2,x5)' 
f <- paste0('min',mycols) 
df %>% rowwise() %>% mutate_(min = f) 
df 
# A tibble: 10 × 6 
#  x1 x2 x3 x4 x5 min 
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 
#1  2  3  0  1  1  1 
#2  0  2  1  0  1  1 
#3  0 NA  0 NA NA NA 
#4  NA  5  1  3  1  1 
#5  0  3  3  0  3  3 
#6  1  2  0  0  4  2 
#7  1 NA NA NA NA NA 
#8  NA NA NA  0  3 NA 
#9  0  4  0  0  3  3 
#10  1  5  1  1  1  1 
+1

谢谢!现在,我想要最低的非NA值,所以我需要稍微调整一下这个代码。看起来从'min'变为'pmin(na.rm = T)'工作(将na.rm = T加到'min()似乎不起作用): 'f < - paste0('pmin (',mycols,',na.rm = T)')' 'df <- df %>%rowwise()%>%mutate_(min = f)' – strangeloop

3

这里的另一种解决方案有点技术与purrr包从设计的函数式编程的tidyverse帮助。

Fist,matchesdplyr的助手将正则表达式字符串作为参数,而不是向量。找到匹配所有列的正则表达式是一种很好的方法。 当你理解functionnal编程的基本计划(代码下,你可以使用你希望dplyr选择助手)

然后,purrr功能的伟大工程与dplyr

解决问题的方法:


df=data.frame(
    x1=c(2,0,0,NA,0,1,1,NA,0,1), 
    x2=c(3,2,NA,5,3,2,NA,NA,4,5), 
    x3=c(0,1,0,1,3,0,NA,NA,0,1), 
    x4=c(1,0,NA,3,0,0,NA,0,0,1), 
    x5=c(1,1,NA,1,3,4,NA,3,3,1)) 


# regex to get only x2 and x5 column 
mycols <- "x[25]" 

library(dplyr) 

df %>% 
    mutate(min_x2_x5 = 
      # select columns that you want in df 
      select(., matches(mycols)) %>% 
      # use pmap on this subset to get a vector of min from each row. 
      # dataframe is a list so pmap works on each element of the list that is to say each row 
      purrr::pmap_dbl(min) 
     ) 
#> x1 x2 x3 x4 x5 min_x2_x5 
#> 1 2 3 0 1 1   1 
#> 2 0 2 1 0 1   1 
#> 3 0 NA 0 NA NA  NA 
#> 4 NA 5 1 3 1   1 
#> 5 0 3 3 0 3   3 
#> 6 1 2 0 0 4   2 
#> 7 1 NA NA NA NA  NA 
#> 8 NA NA NA 0 3  NA 
#> 9 0 4 0 0 3   3 
#> 10 1 5 1 1 1   1 

我不会进一步解释有关purrr在这里,但它工作正常,你的情况