如何捕捉元素矢量，使它们被R dplyr函数读取？

我正在尝试使用dplyr软件包，但我面临着处理变量的问题。如何捕捉元素矢量，使它们被R dplyr函数读取？

让说我有一个简化数据帧

my.data <- as.data.frame(matrix(NA), ncol=4, nrow=6) 
my.data <- as.data.frame(cbind(c("d6", "d7", "d8", "d9", "da", "db"), c(rep("C200", 2), rep("C400", 4)), c(rep("a",5), "b"), c("c", rep("a", 5)))) 
colnames(my.data) <- c("snp", "gene", "ind1", "ind2")

我先用GROUP_BY计算每个基因的SNP数量：

new.data <- my.data %>% group_by(gene) %>% mutate(count = n())

但后来我想通过字符串occurence为百分比每个单独的列的基因：

new.data %>% group_by(gene) %>% filter(grepl("a", ind1)) %>% dplyr::mutate(perc.a.ind1 = n()/count*100) 
new.data %>% group_by(gene) %>% filter(grepl("a", ind2)) %>% dplyr::mutate(perc.a.ind2 = n()/count*100)

它工作正常。问题是我有很多人，我需要自动化它。所以我创建名称的载体，在for循环内运行我的功能（我知道循环不是最好的，我会很乐意升级到适用版本或别的东西）

ind.vec <- colnames(my.data[,3:4]) 
for (i in 1:length(ind.vec){ 
new.data %>% group_by(gene) %>% filter(grepl("a", ind.vec[i])) %>% mutate(percent = n()/count*100)

}

我结束了一个空的t just，就像我的ind.vec中没有元素被识别出来一样。

我读了小插曲https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html，这让我觉得我已经发现了这个问题，但是我很难理解它，并且无法使它适用于我的数据。

我做了一些试验用

ind.vec <- quote(colnames(my.data[,3:4])) 
new.data %>% group_by(gene) %>% filter(grepl("a", !!(ind.vec[i]))) %>% mutate(percent = n()/count*100)

我怎样才能使通过dplyr认可向量元素？

请问您能帮忙吗？

来源

2017-06-15 N.Goue

@IanWesley，谢谢你提到那篇文章。它关闭了我的问题，但在我的情况下，我必须处理ind.vec [i]，索引给我带来了麻烦，因为它没有在as.name（ind.vec）中进行过身份验证。 –

我建议你使用tidyr :: gather为此。

library(tidyverse) 
# or library(dplyr);library(tidyr) 

my.data %>% 
    group_by(gene) %>% 
    mutate(count = n()) %>% 
    gather(ind, string, ind1, ind2) %>% 
    filter(string == "a") %>% 
    group_by(gene, ind, string) %>% 
    mutate(
    n_string = n(), 
    freq = n_string/count * 100) 

# A tibble: 10 x 7 
# Groups: gene, ind, string [4] 
#  snp gene count ind string n_string freq 
# <fctr> <fctr> <int> <chr> <chr> <int> <dbl> 
# 1  d6 C200  2 ind1  a  2 100 
# 2  d7 C200  2 ind1  a  2 100 
# 3  d8 C400  4 ind1  a  3 75 
# 4  d9 C400  4 ind1  a  3 75 
# 5  da C400  4 ind1  a  3 75 
# 6  d7 C200  2 ind2  a  1 50 
# 7  d8 C400  4 ind2  a  4 100 
# 8  d9 C400  4 ind2  a  4 100 
# 9  da C400  4 ind2  a  4 100 
#10  db C400  4 ind2  a  4 100

由于某种原因，我收到警告，但结果与您提供的结果相同。

来源

2017-06-15 20:45:33

@SollanoRabeloBraga，非常感谢你！它解决了我的问题。我修改了聚集功能，包括更多的个人然后我做了

new.data <- test[!duplicated(new.data[, c("gene", "ind", "freq")]),] 

new.data <- cast(test2, gene ~ ind)

擦亮我的结果。

来源

2017-06-16 07:11:34

如何捕捉元素矢量，使它们被R dplyr函数读取？

回答

相关问题