我有一个数据帧像这样:使用dplyr更新给定因素的匹配空白水平等因子水平
df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE,
text = "
plantfam,lepfam,lepsp\n
Asteraceae,Geometridae,Eois sp\n
Asteraceae,Erebidae,\n
Poaceae,Erebidae,\n
Poaceae,Noctuidae,\n
Asteraceae,Saturnidae,Polyphemous sp\n
Melastomaceae,Noctuidae,\n
Asteraceae,,\n
Melastomaceae,,\n
,Noctuidae,\n
,Erebidae,\n
Poaceae, Erebidae,\n")
我想作唯一lepsp
名称上的plantfam
和lepfam
独特的组合条件。每个lepfam必须首先被子集化。并且对于该lepfam子集内的每个独特组合,指定一个morpho物种名称。对于那些plantfam或lepfam是空白的,没有指定morpho物种。重复plantfam
lepfam
组合应给予相同的形态物种名称。输出应该是这样的:
output<-
plantfam lepfam lepsp
Asteraceae Geometridae Eois sp
Asteraceae Erebidae Erebidae_morphosp1
Poaceae Erebidae Erebidae_morphosp2
Poaceae Noctuidae Noctuidae_morphosp1
Asteraceae Saturnidae Polyphemous sp
Melastomaceae Noctuidae Noctuidae_morphosp2
Asteraceae
Melastomaceae
Noctuidae
Erebidae
Poaceae Erebidae Erebidae_morphosp2
我曾尝试:
condition <- quote(lepsp == "" & plantfam != "" & lepfam != "")
subset1 <- df %>% filter(condition) %>% group_by(lepfam) %>%
mutate(lepsp=
paste0(lepfam,"_morphosp",match(plantfam,unique(plantfam))))
subset2 <- df %>% filter(condition) %>% setdiff(df, .)
union(subset1, subset2) %>% arrange(lepsp)
然而,两行Poaceae
和Erebidae
回报不同morphosp号Erebidae_morphosp1
和Erebidae_morphosp2
时,他们应该是相同的。
Source: local data frame [11 x 3]
Groups: lepfam [6]
plantfam lepfam lepsp
<chr> <chr> <chr>
1 Melastomaceae
2 Asteraceae
3 Poaceae Erebidae Erebidae_morphosp1
4 Asteraceae Geometridae Eois sp
5 Asteraceae Erebidae Erebidae_morphosp1
6 Poaceae Erebidae Erebidae_morphosp2
7 Erebidae Erebidae_morphosp3
8 Poaceae Noctuidae Noctuidae_morphosp1
9 Melastomaceae Noctuidae Noctuidae_morphosp2
10 Noctuidae Noctuidae_morphosp3
11 Asteraceae Saturnidae Polyphemous sp
什么'condition'? – Masoud
对于那些空白并且有'plantfam'和'lepfam'名字的'lepsp' – Danielle