2016-02-19 79 views
0

我有多个年份收集的x个单元和y个采样站(每个单元内的多个站)的植被指标的数据框。我想选择收集数据的最近一年的每个单位的所有植被数据。这里是我的数据帧的例子:我希望它看起来像这样按最近年份选择行

veg cover unit station year 
1 tree 0.97 U1  A1 2015 
2 grass 0.21 U1  A1 2015 
3 tree 0.35 U1  A2 2014 
4 grass 0.67 U1  A2 2014 
5 tree 0.45 U2  A3 2013 
6 grass 0.72 U2  A3 2013 
7 tree 0.27 U2  A4 2014 
8 grass 0.67 U2  A4 2014 

veg cover unit station year 
1 tree 0.97 U1  A1 2015 
2 grass 0.21 U1  A1 2015 
3 tree 0.27 U2  A4 2014 
4 grass 0.67 U2  A4 2014 

任何帮助将是非常

veg <- c("tree","grass","tree","grass","tree","grass","tree","grass") 
cover <- c(0.97,0.21,0.35,0.67,0.45,0.72,0.27,0.67) 
unit <- c("U1","U1","U1","U1","U2","U2","U2","U2") 
station <- c("A1","A1","A2","A2","A3","A3","A4","A4") 
year <- c(2015,2015,2014,2014,2013,2013,2014,2014) 
df <- data.frame(veg,cover,unit,station,year) 

数据帧看起来像这样赞赏。

+0

为什么你最近几年不想要?你想定义“近年”吗? – MaxPD

回答

0

这是怎么做没有任何包。

df.by  = by(df, df$unit, FUN = function(t) t[t$year == max(t$year),]) 
df.recent = Reduce(function(...) merge(..., all=T), df.by) 
df.recent 

输出是

>  df.recent 
    veg cover unit station year 
1 grass 0.21 U1  A1 2015 
2 grass 0.67 U2  A4 2014 
3 tree 0.27 U2  A4 2014 
4 tree 0.97 U1  A1 2015 

对于第一行,我们使用函数by由因子df$unit到子集的数据帧。对于每个子集(对于每个单元),我们通过匿名函数function(t) t[t$year == max(t$year),])提取最近一年的行。

df.by是仅包含每个单元的最近一年的行的数据帧的列表。

对于第二行,我们使用merge函数合并df.by中的所有数据帧。此代码的使用在Simultaneously merge multiple data.frames in a list中解释。

+0

谢谢你做到了。 – omwrichmond

0

这会得到你的答案,你想最近的veg/unit是否正确?

library(dplyr) 
df %>% 
    group_by(veg, unit) %>% 
    arrange(desc(year)) %>% 
    slice(1)