2014-11-08 80 views
0

旧数据帧填充新的变量我有一个data.frame它看起来像这样(在现实中1M行):从ddply内中的R

`> DF

   R.DMA.NAMES quarter  daypart allpersons.imp rate     station spot.id 
1 Wilkes.Barre.Scranton.Hztn Q22014 afternoon   0.0 30      WSWB 13048713 
2     Nashville Q12014 primetime   0.0 50    COM NASHVILLE 11969260 
3    Seattle.Tacoma Q12014 primetime   6.1 51 ESPN SEATTLE, EVERETT ZONE 11898905 
4    Jacksonville Q42013 late fringe   2.3 130   Jacksonville WAWS 11617447 
5     Detroit Q22014 overnight   0.0 0      WKBD 12571421 
6   South.Bend.Elkhart Q42013 primetime   11.5 325      WBND 11741171` 

dput(DF)

structure(list(R.DMA.NAMES = c("Wilkes.Barre.Scranton.Hztn", 
"Nashville", "Seattle.Tacoma", "Jacksonville", "Detroit", "South.Bend.Elkhart" 
), quarter = structure(c(3L, 1L, 1L, 6L, 3L, 6L), .Label = c("Q12014", 
"Q22013", "Q22014", "Q32013", "Q32014", "Q42013"), class = "factor"), 
    daypart = c("afternoon", "primetime", "primetime", "late fringe", 
    "overnight", "primetime"), allpersons.imp = c(0, 0, 6.1, 
    2.3, 0, 11.5), rate = c(30, 50, 51, 130, 0, 325), station = c("WSWB", 
    "COM NASHVILLE", "ESPN SEATTLE, EVERETT ZONE", "Jacksonville WAWS", 
    "WKBD", "WBND"), spot.id = c(13048713L, 11969260L, 11898905L, 
    11617447L, 12571421L, 11741171L)), .Names = c("R.DMA.NAMES", 
"quarter", "daypart", "allpersons.imp", "rate", "station", "spot.id" 
), row.names = c(NA, -6L), class = "data.frame") 

我使用ddply函数以执行计算:

ddply(df, .(R.DMA.NAMES, station, quarter), function (x) { 
cpi = sum(df$rate)/sum(df$allpersons.imp) 
}) 

这将创建一个新的data.frame它看起来像这样:

R.DMA.NAMES     station quarter  V1 
1     Detroit      WKBD Q22014  NaN 
2    Jacksonville   Jacksonville WAWS Q42013 56.521739 
3     Nashville    COM NASHVILLE Q12014  Inf 
4    Seattle.Tacoma ESPN SEATTLE, EVERETT ZONE Q12014 8.360656 
5   South.Bend.Elkhart      WBND Q42013 28.260870 
6 Wilkes.Barre.Scranton.Hztn      WSWB Q22014  Inf 

我希望做的是在我原来的df即适用创建一个名为“CPI”新列“ cpi“值应该针对特定的行显示。当然,相同的值将会重复多次,即对于包含R.DMA.NAMES的“Seattle.Tacoma”,Station的“ESPN SEATTLE,EVERETT ZONE”和Quarter的Q12014,每行都会出现8.36。我尝试了几件事情,包括:

transform(df, cpi = ddply(df, .(R.DMA.NAMES, station, quarter), function (x) { 
cpi = sum(df$rate)/sum(df$allpersons.imp) 
}) 

但是这不起作用!有人可以解释。 。

回答

1

使用transformddply

ddply(df, .(R.DMA.NAMES, station, quarter), 
     transform, cpi = sum(rate)/sum(allpersons.imp))