2012-08-26 89 views
12

,我有以下的数据帧:查找最大日期为每个ID

id<-c(1,1,2,3,3) 
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08") 
df<-data.frame(id,date) 
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y") 


id  date  date2 
1 23-01-08 2008-01-23 
1 01-11-07 2007-11-01 
2 30-11-07 2007-11-30 
3 17-12-07 2007-12-17 
3 12-12-08 2008-12-12 

现在我需要创建一个第四列并插入,每个id交易的最大日期。 最终的表应该是如下:

id  date  date2  max 
1 23-01-08 2008-01-23 2008-01-23 
1 01-11-07 2007-11-01 0 
2 30-11-07 2007-11-30 2007-11-30 
3 17-12-07 2007-12-17 0 
3 12-12-08 2008-12-12 2008-12-12 

我会感激,如果你能帮助我。

回答

18
id<-c(1,1,2,3,3) 
date<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08") 
df<-data.frame(id,date) 
df$date2<-as.Date(as.character(df$date), format = "%d-%m-%y") 
# aggregate can be used for this type of thing 
d = aggregate(df$date2,by=list(df$id),max) 
# And merge the result of aggregate 
# with the original data frame 
df2 = merge(df,d,by.x=1,by.y=1) 
df2 

    id  date  date2   x 
1 1 23-01-08 2008-01-23 2008-01-23 
2 1 01-11-07 2007-11-01 2008-01-23 
3 2 30-11-07 2007-11-30 2007-11-30 
4 3 17-12-07 2007-12-17 2008-12-12 
5 3 12-12-08 2008-12-12 2008-12-12 

编辑:由于当日期与最大日期不符时,您希望最后一列为“空”,您可以尝试下一行。

df2[df2[,3]!=df2[,4],4]=NA 

df2 
    id  date  date2   x 
1 1 23-01-08 2008-01-23 2008-01-23 
2 1 01-11-07 2007-11-01  <NA> 
3 2 30-11-07 2007-11-30 2007-11-30 
4 3 17-12-07 2007-12-17  <NA> 
5 3 12-12-08 2008-12-12 2008-12-12 

当然,总是清理colnames等,但我留给你。

2
library(sqldf) 
tables<- '(SELECT * FROM df 
      ) 
      AS t1, 
      (SELECT id,max(date2) date2 FROM df GROUP BY id 
      ) 
      AS t2' 

out<-fn$sqldf("SELECT t1.*,t2.date2 mdate FROM $tables WHERE t1.id=t2.id") 
out$mdate<-as.Date(out$mdate) 
out$mdate[out$date2!=out$mdate]<-NA 
# id  date  date2  mdate 
#1 1 01-11-07 2007-11-01  <NA> 
#2 1 23-01-08 2008-01-23 2008-01-23 
#3 2 30-11-07 2007-11-30 2007-11-30 
#4 3 12-12-08 2008-12-12 2008-12-12 
#5 3 17-12-07 2007-12-17  <NA> 
1

不能使用0作为一个日期值,所以你要么需要放弃保持它作为一个日期或接受NA值:

# Date values: 
df$maxdt <- ave(df$date2, df$id, 
        FUN=function(x) ifelse(x == max(x), as.character(x), NA)) 
str(ave(df$date2, df$id, FUN=function(x) ifelse(x == max(x), as.character(x), NA))) 
# Date[1:5], format: "2008-01-23" NA "2007-11-30" NA "2008-12-12" 

ifelse机器做一些奇怪的类型检查作为上面的第二个参数使用仅仅x,但仍然返回Date类向量。去搞清楚!以下是字符矢量选项。

# Character values: 
df$maxdt <- ave(as.character(df$date2), df$id, 
        FUN=function(x) ifelse(x == max(x), x, "0")) 
ave(as.character(df$date2), df$id, FUN=function(x) ifelse(x == max(x), x, "0")) 
[1] "2008-01-23" "0"   "2007-11-30" "0"   "2008-12-12" 
7

另一种方法是使用plyr包:

library(plyr) 
ddply(df, "id", summarize, max = max(date2)) 

# id  max 
#1 1 2008-01-23 
#2 2 2007-11-30 
#3 3 2008-12-12 

现在,这是不是在你之后的格式,因为那只能说明对方id一次。别担心,我们可以使用transform,而不是summarize

ddply(df, "id", transform, max = max(date2)) 

# id  date  date2  max 
#1 1 01-11-07 2007-11-01 2008-01-23 
#2 1 23-01-08 2008-01-23 2008-01-23 
#3 2 30-11-07 2007-11-30 2007-11-30 
#4 3 12-12-08 2008-12-12 2008-12-12 
#5 3 17-12-07 2007-12-17 2008-12-12 

正如@ seandavi的答案,这种重复max日期为每个id。如果你想重复的改变NA,这样的事情会做的工作:

within(ddply(df, "id", transform, max = max(date2)), max[max != date2] <- NA) 
2

加入,以防有人dplyr解决方案正在寻找:

library(dplyr) 

df %>% 
    group_by(id) %>% 
    mutate(max = if_else(date2 == max(date2), date2, as.Date(NA))) 

结果:

# A tibble: 5 x 4 
# Groups: id [3] 
    id  date  date2  max 
    <dbl> <fctr>  <date>  <date> 
1  1 23-01-08 2008-01-23 2008-01-23 
2  1 01-11-07 2007-11-01   NA 
3  2 30-11-07 2007-11-30 2007-11-30 
4  3 17-12-07 2007-12-17   NA 
5  3 12-12-08 2008-12-12 2008-12-12 
+0

我以这种方式使用它:mutate(flag_last = if_else(date == max(date),TRUE,FALSE))%>%filter(flag_last == TRUE) – Rohit

相关问题