dplyr：基于由变量字符串

鉴于这一数据选择多列变异新列：dplyr：基于由变量字符串

df=data.frame(
    x1=c(2,0,0,NA,0,1,1,NA,0,1), 
    x2=c(3,2,NA,5,3,2,NA,NA,4,5), 
    x3=c(0,1,0,1,3,0,NA,NA,0,1), 
    x4=c(1,0,NA,3,0,0,NA,0,0,1), 
    x5=c(1,1,NA,1,3,4,NA,3,3,1))

我想创建一个使用dplyr选定列的横行最小值一个额外的列min。这很容易使用的列名：

df <- df %>% rowwise() %>% mutate(min = min(x2,x5))

但我有一个大的DF具有不同的列名，所以我需要从价值观mycols的一些字符串匹配。现在其他线程告诉我使用选择帮助函数，但我必须缺少一些东西。下面是matches：

mycols <- c("x2","x5") 
df <- df %>% rowwise() %>% 
    mutate(min = min(select(matches(mycols)))) 
Error: is.string(match) is not TRUE

而且one_of：

mycols <- c("x2","x5") 
df <- df %>% 
rowwise() %>% 
mutate(min = min(select(one_of(mycols)))) 
Error: no applicable method for 'select' applied to an object of class "c('integer', 'numeric')" 
In addition: Warning message: 
In one_of(c("x2", "x5")) : Unknown variables: `x2`, `x5`

我是什么俯瞰？ select_应该工作吗？它不会在以下几点：

df <- df %>% 
    rowwise() %>% 
    mutate(min = min(select_(mycols))) 
Error: no applicable method for 'select_' applied to an object of class "character"

而且同样：

df <- df %>% 
    rowwise() %>% 
    mutate(min = min(select_(matches(mycols)))) 
Error: is.string(match) is not TRUE

来源

2017-02-19 strangeloop

您需要使用dplyr动词的SE版本当使用字符串。在这种情况下，使用'select _（）' –

不能正常工作，因为我预计它可以工作：'df <- df %>％ rowwise（）％>％ mutate（min = min（select_（mycols）））'yield“Error ：没有将'select_'应用于类“字符”类的对象的适用方法“ – strangeloop

由于它将字符串（正则表达式）作为参数而不是字符串向量，因此会出现'matches'错误。 – cderv

这是一个有点棘手。在SE评估的情况下，您需要将该操作作为字符串传递。

mycols <- '(x2,x5)' 
f <- paste0('min',mycols) 
df %>% rowwise() %>% mutate_(min = f) 
df 
# A tibble: 10 × 6 
#  x1 x2 x3 x4 x5 min 
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 
#1  2  3  0  1  1  1 
#2  0  2  1  0  1  1 
#3  0 NA  0 NA NA NA 
#4  NA  5  1  3  1  1 
#5  0  3  3  0  3  3 
#6  1  2  0  0  4  2 
#7  1 NA NA NA NA NA 
#8  NA NA NA  0  3 NA 
#9  0  4  0  0  3  3 
#10  1  5  1  1  1  1

来源

2017-02-19 20:55:24

谢谢！现在，我想要最低的非NA值，所以我需要稍微调整一下这个代码。看起来从'min'变为'pmin（na.rm = T）'工作（将na.rm = T加到'min（）似乎不起作用）： 'f < - paste0（'pmin （'，mycols，'，na.rm = T）'）' 'df <- df %>％rowwise（）％>％mutate_（min = f）' – strangeloop

这里的另一种解决方案有点技术与purrr包从设计的函数式编程的tidyverse帮助。

Fist，matchesdplyr的助手将正则表达式字符串作为参数，而不是向量。找到匹配所有列的正则表达式是一种很好的方法。当你理解functionnal编程的基本计划（代码下，你可以使用你希望dplyr选择助手）

然后，purrr功能的伟大工程与dplyr。

解决问题的方法：

df=data.frame(
    x1=c(2,0,0,NA,0,1,1,NA,0,1), 
    x2=c(3,2,NA,5,3,2,NA,NA,4,5), 
    x3=c(0,1,0,1,3,0,NA,NA,0,1), 
    x4=c(1,0,NA,3,0,0,NA,0,0,1), 
    x5=c(1,1,NA,1,3,4,NA,3,3,1)) 


# regex to get only x2 and x5 column 
mycols <- "x[25]" 

library(dplyr) 

df %>% 
    mutate(min_x2_x5 = 
      # select columns that you want in df 
      select(., matches(mycols)) %>% 
      # use pmap on this subset to get a vector of min from each row. 
      # dataframe is a list so pmap works on each element of the list that is to say each row 
      purrr::pmap_dbl(min) 
     ) 
#> x1 x2 x3 x4 x5 min_x2_x5 
#> 1 2 3 0 1 1   1 
#> 2 0 2 1 0 1   1 
#> 3 0 NA 0 NA NA  NA 
#> 4 NA 5 1 3 1   1 
#> 5 0 3 3 0 3   3 
#> 6 1 2 0 0 4   2 
#> 7 1 NA NA NA NA  NA 
#> 8 NA NA NA 0 3  NA 
#> 9 0 4 0 0 3   3 
#> 10 1 5 1 1 1   1

我不会进一步解释有关purrr在这里，但它工作正常，你的情况

来源

2017-02-19 21:37:30 cderv

dplyr：基于由变量字符串

回答

相关问题