仅从数据帧中选择包含值大于5的列

对R来说很新颖，所以这是一个难题：我有一个从csv导入的数据框。第一列包含行名（基因），第二列包含组分配（如果基因在组1或组4等）。接下来的100列包含基因通路测量（范围-20至+20）。我想，只选择在第1组中的行，然后只对组显示的列含有至少1值1点的行大于10仅从数据帧中选择包含值大于5的列

示例数据：

NAME Group path1 path2 path3 path4 path5 
gene1 8 -19.1 -26.6 3.0 0.8 -5.1 
gene2 1 -2.8 22.8 -1.2 20.8 -9.6 
gene3 4 -5.4 -4.0 2.7 5.8 -6.8 
gene4 1 -9.9 -24.6 7.3 -2.1 -18.9 
gene5 2 -4.7 -9.4 -3.1 0.6 -10.1 
gene6 1 14.0 -5.8 -1.6 -2.5 -18.7 
gene7 5 -6.4 -3.8 2.0 -2.1 -8.6 
gene8 1 -9.9 -4.8 5.2 2.0 -17.5

我曾尝试这一方法但麻烦它适合我的数据 Subset columns in R with specific values

任何帮助将不胜感激！

来源

2016-03-06 user27206

通过tidyr和dplyr重整您的数据以简化您的操作。它会把你的colname放在一列中。然后过滤组和值。

library(tidyr) 
library(dplyr) 
DT %>% 
    gather("Path", "value", -NAME, -Group) %>% 
    filter(Group == 1, value > 10) 
#> NAME Group Path value 
#> 1 gene6  1 path1 14.0 
#> 2 gene2  1 path2 22.8 
#> 3 gene2  1 path4 20.8

如果你想从选定列的所有行Group == 1和所有值，只要保持colnames和子集的表

library(tidyr) 
library(dplyr) 
colname <- DT %>% 
    gather("Path", "value", -NAME, -Group) %>% 
    filter(Group == 1, value > 10) %>% 
    select(Path) 

DT[DT$Group == 1, c("NAME", "Group", colname$Path)] 
#> NAME Group path1 path2 path4 
#> 2 gene2  1 -2.8 22.8 20.8 
#> 4 gene4  1 -9.9 -24.6 -2.1 
#> 6 gene6  1 14.0 -5.8 -2.5 
#> 8 gene8  1 -9.9 -4.8 2.0

来源

2016-03-06 20:59:11 cderv

这工作 - 我使用了所选列中的所有值（解决方案的第二部分）。谢谢！ – user27206

刚内基础R入住，并利用您链接到的问题我们可以做

## Data 
df <- data.frame(NAME = c("gene1","gene2","gene3","gene4"), 
          Group = c(8,1,4,1), 
          path1 = c(-19.1, -2.8, -5.4, -9.9), 
          path2 = c(-26.6, 22.8, -4, -24.6)) 

drops <- c("NAME", "Group") 
keeps <- names(df)[!names(df) %in% drops] 

## Subset the data by the groups of interest first 
df_1 <- df[df$Group == 1,] 

## This next step is similar to your linked question, 
## it just uses `any` in place of `all`, and only on a subset of the columns 

cbind(df_1[, drops], do.call(cbind, lapply(df_1[, keeps], function(x){ if(any(x >= 5)) return(x) }))) 

## Or alternatively, 
df_1[, c(drops, do.call(c, sapply(keeps, function(x) if(any(df[, x] >= 5)) return(x)))) ]

这给

NAME Group path2 
2 gene2  1 22.8 
4 gene4  1 -24.6

来源

2016-03-06 21:15:52 SymbolixAU

我使用Titolondon第二个解决方案，因为它保留了列中的所有信息。感谢您付出努力回复。 – user27206

@ user27206我不确定我是否理解你的评论 - 我的解决方案还保留了该专栏中的所有信息？（请注意，在我的示例中，我使用了一小部分数据） – SymbolixAU

仅从数据帧中选择包含值大于5的列

回答

相关问题