2014-10-02 63 views
0

不知道这是所有的R编码,我知道将消除基于总和值整行,但这里怎么词是什么,我想做一个例子行内的列。保持了比给定值等于或大于

我想借此从个别网站采取的分类信息,但只保留其中的代表总体样本中最低的三倍水平。

例如下表中,虽然在河英里15双翅目被认定为存在一旦订单 - 双翅目整体出现在样品的38倍,所以我想保留该行。同样的Chaetocladius,虽然它出现在RM0.7一次在样品中出现5次,所以我会保留它。

此外,对于在一个水平似乎足够的时间,以保持情况下,有那些正确的是罕见的,需要拆除,并用NA来替换。例如,在RM15的情况下,订购Blattoidea或RM80的情况下,Chironomus atroviridis物种只出现一次,但昆虫纲和摇蚊属现在有足够的时间保存,因此我想保留这些水平,但用NAs替代其余水平。

RM phylum  class order family   genus   species    Sum 
0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius Chaetocladius mel 1 
15 Arthropoda Insecta Diptera NA    NA    NA 1 
15 Arthropoda Insecta Blattoidea NA   NA    NA 1 
0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1 
54 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
35 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
80 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus atroviridis 2 
80 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 1 
0.5 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 29 

新的输出应该是这样的 -

RM phylum class order family genus species Sum 
0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1 
15 Arthropoda Insecta Diptera NA    NA    NA 1 
15 Arthropoda Insecta NA  NA    NA    NA 1 
0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1 
54 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
35 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
80 Arthropoda Insecta Diptera Chironomidae Chironomus  NA 2 
80 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 1 
0.5 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 29 

我已经汇总列出了这些类群的每个级别为3或更大的价值,我想也许我可以工作,我的方式通过每个(从门到物种),但无法弄清楚如何去做。

请帮忙。

回答

0

有可能做到这一点更简单的方法,但是这提供你想要的输出。它被包装在一个函数clean_data中,您可以指定必须存在多少次保留。在这种情况下,所提供的数据中不出现两次以上的数据将被NA所取代。这是否符合您的需求?

dat <- read.table(header=T, text=' 
RM phylum  class order family   genus   species    Sum 
0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius "Chaetocladius mel" 1 
        15 Arthropoda Insecta Diptera NA    NA    NA 1 
        15 Arthropoda Insecta Blattoidea NA   NA    NA 1 
        0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 1 
        54 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
        35 Arthropoda Insecta Diptera Chironomidae Chaetocladius NA 2 
        80 Arthropoda Insecta Diptera Chironomidae Chironomus "Chironomus atroviridis" 2 
        80 Arthropoda Insecta Diptera Chironomidae Chironomus "Chironomus bifurcatus" 1 
        0.5 Arthropoda Insecta Diptera Chironomidae Chironomus "Chironomus bifurcatus" 29 
        ') 

clean_data <- function(dat, repeats){ 
    # get the counts of each level within each column 
    counts <- sapply(dat[,colnames(dat) != c("RM", "Sum")], table) 

    # convert data to matrix for indexing 
    dat <- as.matrix(dat) 

    indices <- unlist(
    # get indices of where the elements are in data matrix 
    lapply(
     # remove list elements that are character(0) 
     Filter(length, 
        # find which levels are only present 'repeats' times 
        lapply(counts,FUN = function(x) names(which(x < repeats)))), 
     FUN = function(y) which(dat %in% y))) 

    # set indices to NA 
    dat[indices] <- NA 
    return(as.data.frame(dat)) 
} 

clean_data(dat, 2) 

> clean_data(dat, 2) 
    RM  phylum class order  family   genus    species Sum 
1 0.5 Arthropoda Insecta Diptera Chironomidae Chaetocladius     <NA> 1 
2 15.0 Arthropoda Insecta Diptera   <NA>   <NA>     <NA> 1 
3 15.0 Arthropoda Insecta <NA>   <NA>   <NA>     <NA> 1 
4 0.7 Arthropoda Insecta Diptera Chironomidae Chaetocladius     <NA> 1 
5 54.0 Arthropoda Insecta Diptera Chironomidae Chaetocladius     <NA> 2 
6 35.0 Arthropoda Insecta Diptera Chironomidae Chaetocladius     <NA> 2 
7 80.0 Arthropoda Insecta Diptera Chironomidae Chironomus     <NA> 2 
8 80.0 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 1 
9 0.5 Arthropoda Insecta Diptera Chironomidae Chironomus Chironomus bifurcatus 29 
+0

感谢您的帮助。这工作,我同意必须有一个更简单的方法,但我还没有找到它。 – 2014-10-16 14:41:37

相关问题