2017-09-02 53 views
0

我这里有一个数据帧(flights_delay): enter image description here获取avg_delay由目的地

在这个数据帧(flights_delay),我有重复的目的地(在 “目标” 栏)。我试图通过目标(“dest”列)获得平均延迟(“avg_delay”列)。我曾尝试这样的代码:

sum_avg_delay <- aggregate(avg_delay~dest,flights_delay,sum)$avg_delay 

不幸的是,我得到的数字向量没有任何目的地标签。

我也试过dplyr::summarise函数,但是这会返回一个错误。

必须有一种更简单的方法来获得目的地的平均延迟。

+1

大概'骨料(avg_delay〜DEST,flights_delay,总和)'对于2列data.frame或带有(flights_delay,tapply(avg_delay,dest,sum))的命名向量。 – lmo

回答

2

你在正确的轨道上,只是简化:

df <- data.frame(dest=c("IAH","IAH","MIA","BQN","ATL","ATL"), 
      avg_delay=c(13,24,35,-19,-31,8)) 

aggregate(avg_delay ~ dest, sum, data=df) 

    dest avg_delay 
1 ATL  -23 
2 BQN  -19 
3 IAH  37 
4 MIA  35 
2

这里是一个选项使用dplyr

suppressPackageStartupMessages(library(dplyr)) 

df <- data.frame(dest=c("IAH","IAH","MIA","BQN","ATL","ATL"), 
       avg_delay=c(13,24,35,-19,-31,8)) 

# average delay by destination 
df %>% 
    group_by(dest) %>% 
    summarise(avg_delay = mean(avg_delay)) 
#> # A tibble: 4 x 2 
#>  dest avg_delay 
#> <fctr>  <dbl> 
#> 1 ATL  -11.5 
#> 2 BQN  -19.0 
#> 3 IAH  18.5 
#> 4 MIA  35.0 

# sum of average delay by destination 
df %>% 
    group_by(dest) %>% 
    summarise(avg_delay = sum(avg_delay)) 
#> # A tibble: 4 x 2 
#>  dest avg_delay 
#> <fctr>  <dbl> 
#> 1 ATL  -23 
#> 2 BQN  -19 
#> 3 IAH  37 
#> 4 MIA  35