2017-06-29 56 views
0

我有一个数据帧像这样:使用dplyr更新给定因素的匹配空白水平等因子水平

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, 
text = " 
plantfam,lepfam,lepsp\n 
      Asteraceae,Geometridae,Eois sp\n 
      Asteraceae,Erebidae,\n 
      Poaceae,Erebidae,\n 
      Poaceae,Noctuidae,\n 
      Asteraceae,Saturnidae,Polyphemous sp\n 
      Melastomaceae,Noctuidae,\n 
      Asteraceae,,\n 
      Melastomaceae,,\n 
      ,Noctuidae,\n 
      ,Erebidae,\n 
      Poaceae, Erebidae,\n") 

我想作唯一lepsp名称上的plantfamlepfam独特的组合条件。每个lepfam必须首先被子集化。并且对于该lepfam子集内的每个独特组合,指定一个morpho物种名称。对于那些plantfam或lepfam是空白的,没有指定morpho物种。重复plantfamlepfam组合应给予相同的形态物种名称。输出应该是这样的:

output<- 
plantfam  lepfam      lepsp 
Asteraceae  Geometridae     Eois sp   
Asteraceae  Erebidae     Erebidae_morphosp1     
Poaceae   Erebidae     Erebidae_morphosp2 
Poaceae   Noctuidae     Noctuidae_morphosp1  
Asteraceae  Saturnidae     Polyphemous sp   
Melastomaceae Noctuidae     Noctuidae_morphosp2 
Asteraceae    
Melastomaceae 
       Noctuidae 
       Erebidae 
Poaceae   Erebidae     Erebidae_morphosp2 

我曾尝试:

condition <- quote(lepsp == "" & plantfam != "" & lepfam != "") 
subset1 <- df %>% filter(condition) %>% group_by(lepfam) %>% 
mutate(lepsp= 
paste0(lepfam,"_morphosp",match(plantfam,unique(plantfam)))) 
subset2 <- df %>% filter(condition) %>% setdiff(df, .) 
union(subset1, subset2) %>% arrange(lepsp) 

然而,两行PoaceaeErebidae回报不同morphosp号Erebidae_morphosp1Erebidae_morphosp2时,他们应该是相同的。

Source: local data frame [11 x 3] 
Groups: lepfam [6] 

        plantfam  lepfam    lepsp 
         <chr>  <chr>    <chr> 
1     Melastomaceae         
2      Asteraceae         
3       Poaceae Erebidae Erebidae_morphosp1 
4      Asteraceae Geometridae    Eois sp 
5      Asteraceae Erebidae Erebidae_morphosp1 
6       Poaceae Erebidae Erebidae_morphosp2 
7         Erebidae Erebidae_morphosp3 
8       Poaceae Noctuidae Noctuidae_morphosp1 
9     Melastomaceae Noctuidae Noctuidae_morphosp2 
10         Noctuidae Noctuidae_morphosp3 
11      Asteraceae Saturnidae  Polyphemous sp 
+0

什么'condition'? – Masoud

+0

对于那些空白并且有'plantfam'和'lepfam'名字的'lepsp' – Danielle

回答

0

我认为这个问题可能仅是在你df,最后一行有Erebidae前的空间,从而导致R键认为这是从另外一个不同的。

我发现,当我正在完成我的答案。这里'我将如何做你想做的事情。我先介绍一组lepfam_number之前的mutate来粘贴。

library(dplyr) 
df %>% 
    group_by(lepfam) %>% 
    mutate(lepfam_number= match(plantfam, unique(plantfam)), 
     lepsp=ifelse(lepsp=="" & lepfam!="" & trimws(plantfam)!="", 
       paste0(lepfam,"_morphosp",lepfam_number), 
       lepsp) 
) 

        plantfam  lepfam    lepsp lepfam_number 
         <chr>  <chr>    <chr>   <int> 
1     Asteraceae Geometridae    Eois sp    1 
2     Asteraceae Erebidae Erebidae_morphosp1    1 
3      Poaceae Erebidae Erebidae_morphosp2    2 
4      Poaceae Noctuidae Noctuidae_morphosp1    1 
5     Asteraceae Saturnidae  Polyphemous sp    1 
6    Melastomaceae Noctuidae Noctuidae_morphosp2    2 
7     Asteraceae            1 
8    Melastomaceae            2 
9        Noctuidae         3 
10        Erebidae         3 
11     Poaceae Erebidae Erebidae_morphosp2    2 

数据

df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE, 
       text = " 
plantfam,lepfam,lepsp\n 
      Asteraceae,Geometridae,Eois sp\n 
      Asteraceae,Erebidae,\n 
      Poaceae,Erebidae,\n 
      Poaceae,Noctuidae,\n 
      Asteraceae,Saturnidae,Polyphemous sp\n 
      Melastomaceae,Noctuidae,\n 
      Asteraceae,,\n 
      Melastomaceae,,\n 
      ,Noctuidae,\n 
      ,Erebidae,\n 
      Poaceae,Erebidae,\n") 
+0

好的!如果你有一点时间,我试着去了解'匹配'在这里工作的方式。据我了解,*禾本科*在'独特(plantfam)'中位置2。在第3和第4行中,它被认为是2和1-是因为前面的'group_by(lepfam)'?也许我误解了'group_by'?谢谢您的帮助。 –

+0

@LukeC是的,因为我首先由lepfam分组,在该特定组中,禾本科的独特(plantfam)总是2。 –

+0

@P Lapointe明白了,这很有道理 - 谢谢澄清! –