2016-10-03 50 views
0

我正在处理一些奇怪格式的调查数据(由其他人收集并记录)。它记录了调查断面上的物种丰度,但它只列出了在给定样带中观察到的物种,并没有记录所有可能的物种。我花了一些时间弄清楚如何使用tidyr重新塑造数据,以便在每次调查期间为每个物种设置一个列,而没有记录的物种填充0。这里是一个简短的,可重复的例子:在tidyr中加上具有重复标识符的行:: spread

#This works: 
Survey <- as.factor(c(rep("Survey 1",10),rep("Survey 2",10),rep("Survey 3",10))) 
Species <- as.factor(c(c("A","B","C","D","E","U","V","W","X","Y"),c("A","C","E","G","I","K","M","O","Q","S"),c("B","D","F","H","J","L","N","P","R","T"))) 
Abundance <- ceiling(runif(30,1,50)) 

working.df<-cbind.data.frame(Survey,Species,Abundance) 

working.spread<-working.df %>% 
    group_by(Survey) %>% 
    spread(Species,Abundance,drop=F,fill=0) 

不幸的是,真正的数据并非这么简单。在某些情况下,他们在一次调查中记录了同一物种的多行,以便他们可以记录我不感兴趣的其他变量的信息。我只关心每次调查的总丰度。因此,这是真正的数据可能看起来像一个例子(注意双“A”在Species2开始):

#This doesn't work:  
Species2 <- as.factor(c(c("A","A","C","D","E","U","V","W","X","Y"),c("A","C","E","G","I","K","M","O","Q","S"),c("B","D","F","H","J","L","N","P","R","T"))) 

not.working.df<-cbind.data.frame(Survey,Species2,Abundance) 

not.working.spread<-not.working.df %>% 
    group_by(Survey) %>% 
    spread(Species2,Abundance,drop=F,fill=0) 

所以,当有两个同种的上市,价差说法没有较长的作品,并返回熟悉的错误:

Error: Duplicate identifiers for rows (1, 2) 

而在真实数据集我得到了不少这些重复的错误(这只是几个数据集之一),所以我不希望当然要经过并手动修复:

Error: Duplicate identifiers for rows (206, 216), (1532, 1544), (1052, 1595), (1324, 1330), (191, 212), (194, 211), (1392, 1600), (19, 37), (1404, 1599), (199, 215), (1073, 1596), (1074, 1597), (43, 44, 45), (455, 456), (380, 381, 382, 383), (447, 448), (413, 414, 415, 416, 417, 418), (303, 304), (1015, 1016), (897, 898, 1593), (1306, 1307), (1041, 1594), (1076, 1598), (1425, 1426), (49, 64), (198, 214) 

我想要做的是在重复标识符之间总结丰度字段。我知道这里有类似的问题,并且我对其中的许多人都有所了解,但是我还没有找到解决方案。我一直在努力做到这一点与传播,它似乎是我一个简单的函数命令远离这个工作...任何意见将不胜感激。或者如果我完全错过了对这个问题的现有答案,请指出我的方向。

干杯

+0

听起来就像你需要在扩散前总结数据集。 [这个答案](http://stackoverflow.com/a/35228491/2461552)作为一个很好的解释过程。 – aosmith

+0

谢谢,这样做!以下解决方案 – stewart6

回答

1

感谢,艾欧史密斯,指着我的总结线程该诀窍的方向。这里的工作解决方案:

not.working.spread<-not.working.df %>% 
    group_by(Survey,Species2) %>% 
    summarize(Abundance = sum(Abundance)) %>% 
    spread(Species2,Abundance,drop=F,fill=0)