2017-04-11 45 views
2

我有这样子集与对应的ID名称3个阈值条件一个数据帧

ID <- c ("ABC_10","AZM_11","ABC_11","ABC_12", 
     "ABC_13","AZM_12","ABC_14","ABC_15", 
     "CZX_10","CZX_11","CZX_12","CZX_13", 
     "FIN_10","FIN_11","FIN_12","FIN_13", 
     "FNM_10","FNM_11","FXS_10","FXS_11") 
Id.n <- c(345,380,339,361, 
      245,390,639,661, 
      545,580,539,261, 
      345,180,139,261, 
      1045,1580,39,161) 
df <- data.frame(ID,Id.n) 

一个数据帧我试图子集此数据帧具有下列条件

Threshold of ID.n's for FXS - 100 
Threshold of ID.n's for FIN - 200 
Threshold of ID.n's for all Other ID's - 300 

期望的输出

 ID Id.n 
    ABC_10 345 
    AZM_11 380 
    ABC_11 339 
    ABC_12 361 
    AZM_12 390 
    ABC_14 639 
    ABC_15 661 
    CZX_10 545 
    CZX_11 580 
    CZX_12 539 
    FIN_10 345 
    FIN_13 261 
    FNM_10 1045 
    FNM_11 1580 
    FXS_11 161 

我正在尝试做这种方式,但只是没有得到正确的。

df <- subset(df,ifelse(grepl("FXS",df$ID), df$ID.n > 100,)) 

有人能指出我正确的方向吗?

回答

4

使用dplyr

library(dplyr) 

df2 <- df %>% 
    filter((grepl("FXS", ID) & Id.n > 100) | 
      (grepl("FIN", ID) & Id.n > 200) | 
      (!grepl("FXS|FIN", ID) & Id.n > 300)) 

df2 
#  ID Id.n 
# ABC_10 345 
# AZM_11 380 
# ABC_11 339 
# ABC_12 361 
# AZM_12 390 
# ABC_14 639 
# ABC_15 661 
# CZX_10 545 
# CZX_11 580 
# CZX_12 539 
# FIN_10 345 
# FIN_13 261 
# FNM_10 1045 
# FNM_11 1580 
# FXS_11 161 
4

这与杀毒的数据简单。使用data.table,那简直就好像...

library(data.table) 
setDT(df) 
df[, c("x", "y") := tstrsplit(ID, "_")][, ID := NULL ] 

xDT = data.table(x = unique(df$x)) 
xDT[, th := 300 ]  
xDT[.(x = c("FXS", "FIN"), th = c(100, 200)), on=.(x), th := i.th ] 

然后非等距加入作品进行过滤:

df[xDT, on=.(x, Id.n > th)] 

    Id.n x y 
1: 300 ABC 11 
2: 300 ABC 10 
3: 300 ABC 12 
4: 300 ABC 14 
5: 300 ABC 15 
6: 300 AZM 11 
7: 300 AZM 12 
8: 300 CZX 12 
9: 300 CZX 10 
10: 300 CZX 11 
11: 200 FIN 13 
12: 200 FIN 10 
13: 300 FNM 10 
14: 300 FNM 11 
15: 100 FXS 11 

关于grepl在这里,我认为这是a bad idea

+0

感谢您指出。 – Sharath

1
df[(grepl("FXS",df$ID) & df$Id.n >= 100) | 
     (grepl("FIN",df$ID) & df$Id.n >= 200) | 
     (!(grepl("FXS",df$ID) | grepl("FIN", df$ID)) & df$Id.n >= 300),] 
#  ID Id.n 
#1 ABC_10 345 
#2 AZM_11 380 
#3 ABC_11 339 
#4 ABC_12 361 
#6 AZM_12 390 
#7 ABC_14 639 
#8 ABC_15 661 
#9 CZX_10 545 
#10 CZX_11 580 
#11 CZX_12 539 
#13 FIN_10 345 
#16 FIN_13 261 
#17 FNM_10 1045 
#18 FNM_11 1580 
#20 FXS_11 161 
相关问题