2016-09-14 66 views
1

我需要建立的加权平均中R.加权平均值与ddply是错误的(R,ddply)

塌陷行时数据

收起由品牌和名称
name = c("car1", "car2", "car2", "car2", "car3", "car1") 
brand = c("b1", "b2", "b2", "b2", "b3", "b1") 
production = c(10, 10, 30, 40, 10, 5) 
fuelEconomy= c(1, 2, 3, 5, 2, 4) 
size = c(10, 50, 30,40,20, 7) 
adf = data.frame(brand, name, production, fuelEconomy, size) 

adfSum <- ddply(adf, .(brand, name), 
       summarise, 
       fuelEconomySum = sum(fuelEconomy*production)/sum(production), 
       productionSum = sum(production), 
sizeSum = (sum(size*production)/sum(production))) 

结果: 第一个加权平均值(fuelEconomySum)是正确的,但最后一个sizeSum是不正确的。正确的值在括号中。

brand name fuelEconomySum production sizeSum 
b1 car1 2.000 15 17 (9) 
b2 car2 3.875 80 120 (37.5) 
b3 car3 2.000 10 20 (20) 

我正在寻找一种解决方案来同时创建多个加权平均值。

感谢

回答

0

这工作(使用dplyrmagrittr):

name = c("car1", "car2", "car2", "car2", "car3", "car1") 
brand = c("b1", "b2", "b2", "b2", "b3", "b1") 
production = c(10, 10, 30, 40, 10, 5) 
fuelEconomy= c(1, 2, 3, 5, 2, 4) 
size = c(10, 50, 30,40,20, 7) 
adf = data.frame(brand, name, production, fuelEconomy, size) 

library(magrittr) 
library(dplyr) 

afdSum <- adf %>% 
    group_by(brand, name) %>% 
    summarise(fuelEconomySum = sum(fuelEconomy*production)/sum(production), 
      productionSum = sum(production), 
      sizeSum = sum(size*production)/sum(production)) %>% 
    as.data.frame() 


> afdSum 
    brand name fuelEconomySum productionSum sizeSum 
    1 b1 car1   2.000   15  9.0 
    2 b2 car2   3.875   80 37.5 
    3 b3 car3   2.000   10 20.0 

编辑:您的解决方案,顺便说一下,工作正常,我。

> devtools::session_info("plyr") 
Session info  --------------------------------------------------------------------------- 
setting value      
version R version 3.3.1 (2016-06-21) 
system x86_64, linux-gnu   
ui  RStudio (0.99.491)   
language en_US      
collate en_US.UTF-8     
tz  <NA>       
date  2016-09-14     

Packages  ------------------------------------------------------------------------------- 
package * version date  source   
plyr * 1.8.3 2015-06-12 CRAN (R 3.3.0) 
Rcpp  0.12.5 2016-05-14 CRAN (R 3.3.0) 
+0

感谢您的贡献。 我发现了错误。这是在我的变量的命名。我将变量名称更改为productionSum,以便在本文中明确说明。但在我的脚本中,我只是把它命名为production,这与我的输入相同。这导致了这样一个事实,即最后的操作已经把生产的总和而不是单个的价值。 –