正如@Arun在评论中建议,reshape
会为您做到这一点。
d<-read.table(text='City Date Revenue Costs
"New York" "Feb 1" 2000 200
"San Fran" "Feb 3" 1200 300
Boston "Feb 1" 1500 400', header=TRUE)
reshape(d[! names(d) %in% 'Costs'], idvar='Date', timevar='City', direction='wide')
# Date Revenue.New York Revenue.San Fran Revenue.Boston
# 1 Feb 1 2000 NA 1500
# 2 Feb 3 NA 1200 NA
如果有你想先结合起来,城市/日期多个条目,就可以使用aggregate
。
d<-read.table(text='City Date Revenue Costs
"New York" "Feb 1" 2000 200
"New York" "Feb 1" 1000 100
"San Fran" "Feb 3" 1200 300
Boston "Feb 1" 1500 400', header=TRUE)
dd<-with(d, aggregate(Revenue, by=list(City=City, Date=Date), sum))
# City Date x
# 1 Boston Feb 1 1500
# 2 New York Feb 1 3000
# 3 San Fran Feb 3 1200
ddd<-reshape(dd, idvar='Date', timevar='City', direction='wide')
# Date x.Boston x.New York x.San Fran
# 1 Feb 1 1500 3000 NA
# 3 Feb 3 NA NA 1200
然后用0
代替NA
s。
ddd[is.na(ddd)] <- 0
# Date x.Boston x.New York x.San Fran
# 1 Feb 1 1500 3000 0
# 3 Feb 3 0 0 1200
为了解决点@Arun下面带来了,前面的步骤之前,你可以使用merge
功能填补丢失的日期。
missing.Dates <- c('Feb 2')
ddd<-merge(ddd, data.frame(Date=missing.Dates), by='Date', all=TRUE)
# Date x.Boston x.New York x.San Fran
#1 Feb 1 1500 3000 NA
#2 Feb 3 NA NA 1200
#3 Feb 2 NA NA NA
ddd[is.na(ddd)] <- 0
# Date x.Boston x.New York x.San Fran
# 1 Feb 1 1500 3000 0
# 2 Feb 3 0 0 1200
# 3 Feb 2 0 0 0
不仅重塑,按照日期聚集,城市,以及 – 2013-03-22 13:51:03
谢谢阿伦,汇总+重塑这两个简单的步骤,节省了我写很长的循环功能的麻烦。 – 2013-03-22 14:11:44