2017-08-03 86 views
1

此问题是从here的扩展。
如果我的数据有一个名为Remark柱:保留基于其他列的观察

ID Name Type Date   Amount Remark 
1  AAAA First 2009/7/20  100  Not want 
1  AAAA First 2010/2/3  200  want ya 
2  BBBB First 2015/3/10  250  
2  CCC  Second 2009/2/23  300  good 
2  CCC  Second 2010/1/25  400  OK Right123 
2  CCC  Third 2015/4/9  500  
2  CCC  Third 2016/6/25  700  Stackoverflow is awesome 

我想我的结果,以保持它当Date为最大。
首先,如果我不考虑列Remark,我可以使用max()得到这个:

dt[,.(Date = max(Date), Amount = sum(Amount)), by = .(ID, Name, Type)] 
    ID Name Type  Date Amount 
1: 1 AAAA First 2010-02-03  300 
2: 2 BBBB First 2015-03-10  250 
3: 2 CCC Second 2010-01-25  700 
4: 2 CCC Third 2016-06-25 1200 

不过,我怎能备注。

ID Name Type  Date Amount  Remark 
1: 1 AAAA First 2010-02-03  300  want ya 
2: 2 BBBB First 2015-03-10  250  
3: 2 CCC Second 2010-01-25  700  OK Right123 
4: 2 CCC Third 2016-06-25 1200  Stackoverflow is awesome 

这里是我的数据:

dt <- fread(" 
     ID Name Type Date   Amount Remark 
     1  AAAA First 2009/7/20  100  Not.want 
     1  AAAA First 2010/2/3  200  want.ya 
     2  BBBB First 2015/3/10  250  
     2  CCC  Second 2009/2/23  300  good 
     2  CCC  Second 2010/1/25  400  OK.Right123 
     2  CCC  Third 2015/4/9  500  
     2  CCC  Third 2016/6/25  700  Stackoverflow.is.awesome 
     ") 
dt$Date <- as.Date(dt$Date) 
+0

请在重现的格式提供数据。 – Frank

+0

@Frank我编辑我的问题。 –

+1

请参阅https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250我们应该能够在新的R会话中复制粘贴代码并查看相同的示例数据。我仍然在那里看到非日期......另外,运行'fread'时出现错误。 – Frank

回答

2

我们可以用一个join

setcolorder(dt[, setdiff(names(dt), "Amount"), with = FALSE][dt[, .(Date = max(Date), 
       Amount = sum(Amount)), 
     by = .(ID, Name, Type)], on = .(ID, Name, Type, Date)], names(dt))[] 
# ID Name Type  Date Amount     Remark 
#1: 1 AAAA First 2010-02-03 300     want ya 
#2: 2 BBBB First 2015-03-10 250       
#3: 2 CCC Second 2010-01-25 700    OK Right123 
#4: 2 CCC Third 2016-06-25 1200 Stackoverflow is awesome 

或不加入

dt1 <- dt[, c(Amount = sum(.SD[["Amount"]]), .SD[which.max(Date), 
    setdiff(names(.SD), "Amount"), with = FALSE]), .(ID, Name, Type)] 

setcolorder(dt1, names(dt)) 
dt1 
# ID Name Type  Date Amount     Remark 
#1: 1 AAAA First 2010-02-03 300     want ya 
#2: 2 BBBB First 2015-03-10 250       
#3: 2 CCC Second 2010-01-25 700    OK Right123 
#4: 2 CCC Third 2016-06-25 1200 Stackoverflow is awesome 

如果有更多数量的“金额”栏的是sum MED

nm1 <- grep("Amount\\d*", names(dt), value = TRUE) 
setcolorder(dt[, setdiff(names(dt), nm1), with = FALSE][dt[, c(Date= max(Date), 
     lapply(.SD, sum)), by = .(ID, Name, Type), .SDcols = nm1], 
     on = .(ID, Name, Type, Date)], names(dt))[] 
+1

如果我有超过3列需要总结('Amount','Amount1','Amount2'),我该怎么办? –

+2

@PeterChen在这种情况下,使用'dt [,c(日期=最大(日期), lapply(.SD,sum)), by =。(ID,Name,Type),.SDcols = AmountCols]'within第一个解决方案的第二个链,并使用'setdiff' – akrun

1
> df 
    ID Name Type  Date Amount     Remark 
1: 1 AAAA First 03-02-2010 200     want ya 
2: 2 CCC Third 09-04-2015 500       
3: 2 BBBB First 10-03-2015 250       
4: 1 AAAA First 20-07-2009 100     Not want 
5: 2 CCC Second 23-02-2009 300      good 
6: 2 CCC Second 25-01-2010 400    OK Right123 
7: 2 CCC Third 25-06-2016 700 Stackoverflow is awesome 

> df2=df[,.(Date = max(Date), Amount = sum(Amount)), by = .(ID, Name, Type)] 
> df2 
    ID Name Type  Date Amount 
1: 2 BBBB First 10-03-2015 250 
2: 1 AAAA First 20-07-2009 300 
3: 2 CCC Second 25-01-2010 700 
4: 2 CCC Third 25-06-2016 1200 


> df[df2,] 
    ID Name Type  Date Amount     Remark i.ID i.Name i.Type i.Amount 
1: 2 BBBB First 10-03-2015 250        2 BBBB First  250 
2: 1 AAAA First 20-07-2009 100     Not want 1 AAAA First  300 
3: 2 CCC Second 25-01-2010 400    OK Right123 2 CCC Second  700 
4: 2 CCC Third 25-06-2016 700 Stackoverflow is awesome 2 CCC Third  1200 


> df3=df[df2,c("ID","Name","Type","Date","Remark","i.Amount")] 
> df3 
    ID Name Type  Date     Remark i.Amount 
1: 2 BBBB First 10-03-2015        250 
2: 1 AAAA First 20-07-2009     Not want  300 
3: 2 CCC Second 25-01-2010    OK Right123  700 
4: 2 CCC Third 25-06-2016 Stackoverflow is awesome  1200 
+1

对'Amount'列进行更改,您的答案有一些问题。不正确。但方式是对的。 –