2017-08-31 56 views
0

我正在研究我在R的技能。如果可能,我想用dplyr包来解决这个问题。计算组平均,然后基于组落后

我有幻想足球统计数据集。每条记录都是一名球员一个赛季(一周)的统计数据,包括该球员当周的梦幻足球积分。

下面是数据的片断我的工作:

  Player Week year Fantasy.Points Avg.Fantasy.Ponts 
1 Aaron Hernandez  1 2011   16.3   9.678571 
2 Aaron Hernandez  2 2011   12.2   9.678571 
3 Aaron Hernandez  5 2011   5.6   9.678571 
4 Aaron Hernandez  6 2011   10.8   9.678571 
5 Aaron Hernandez  8 2011   7.1   9.678571 
6 Aaron Hernandez  9 2011   9.5   9.678571 
7 Aaron Hernandez 10 2011   4.1   9.678571 
8 Aaron Hernandez 11 2011   4.4   9.678571 
9 Aaron Hernandez 12 2011   6.2   9.678571 
10 Aaron Hernandez 13 2011   4.3   9.678571 
11 Aaron Hernandez 14 2011   8.4   9.678571 
12 Aaron Hernandez 15 2011   20.5   9.678571 
13 Aaron Hernandez 16 2011   3.7   9.678571 
14 Aaron Hernandez 17 2011   22.4   9.678571 
15 Aaron Hernandez  1 2012   12.4   8.755556 
16 Aaron Hernandez  6 2012   9.0   8.755556 
17 Aaron Hernandez  7 2012   5.4   8.755556 
18 Aaron Hernandez 12 2012   3.6   8.755556 
19 Aaron Hernandez 13 2012   9.7   8.755556 
20 Aaron Hernandez 14 2012   17.8   8.755556 

领域Avg.Fantasy.Points是点那个球员是值得在该记录一年的平均数。例如,Aaron Hernandez在2011赛季的平均价值为9.678571分,2012赛季为8.755556分。

我感兴趣的是计算一个玩家在前一年值得的平均分数的列。在上面的例子中,2012年Aaron Hernandez的记录应该显示前一年的平均值为9.68571。

回答

1

我找到了一种解决方法,类似于SQL中的子查询。

df_te从以上片段数据框:

df_te %>% 
    left_join(
     mutate(next.year = year + 1) %>% #add a column for the next year 
     group_by(Player, year) %>% 
     mutate(Previous.Avg.Fantasy.Points = first(Avg.Fantasy.Points) %>% #Copy of 'Avg.Fantasy.Points' column, with the name I'd like to have for new column 
     filter(row_number() == 1) %>% #Only keep one row per player/year group to avoid duplication upon join 
     select(Player, next.year, Previous.Avg.Fantasy.Points) #keep only columns I'd like to join in 
    by = c("Player" = "Player", "year" = "next.year") #By joining 'year' on LHS table with 'next.year' on RHS table, can get the previous year's average points.  
) 
0

由于您使用的是dplyr包,我要为大家介绍使用lag功能。它可以移动给定数量的行的值。默认值为1.最后一行select(c(colnames(dt), "Pre.Avg.Fantasy.Ponts"))仅用于调整列的顺序。 dt2是最终输出。

library(dplyr) 

dt2 <- dt %>% 
    group_by(Player, year) %>% 
    summarise(Avg.Fantasy.Ponts = first(Avg.Fantasy.Ponts)) %>% 
    mutate(Pre.Avg.Fantasy.Ponts = lag(Avg.Fantasy.Ponts)) %>% 
    select(-Avg.Fantasy.Ponts) %>% 
    right_join(dt, by = c("Player", "year")) %>% 
    select(c(colnames(dt), "Pre.Avg.Fantasy.Ponts")) 

数据

dt <- read.table(text = "   Player Week year Fantasy.Points Avg.Fantasy.Ponts 
1 'Aaron Hernandez'  1 2011   16.3   9.678571 
       2 'Aaron Hernandez'  2 2011   12.2   9.678571 
       3 'Aaron Hernandez'  5 2011   5.6   9.678571 
       4 'Aaron Hernandez'  6 2011   10.8   9.678571 
       5 'Aaron Hernandez'  8 2011   7.1   9.678571 
       6 'Aaron Hernandez'  9 2011   9.5   9.678571 
       7 'Aaron Hernandez' 10 2011   4.1   9.678571 
       8 'Aaron Hernandez' 11 2011   4.4   9.678571 
       9 'Aaron Hernandez' 12 2011   6.2   9.678571 
       10 'Aaron Hernandez' 13 2011   4.3   9.678571 
       11 'Aaron Hernandez' 14 2011   8.4   9.678571 
       12 'Aaron Hernandez' 15 2011   20.5   9.678571 
       13 'Aaron Hernandez' 16 2011   3.7   9.678571 
       14 'Aaron Hernandez' 17 2011   22.4   9.678571 
       15 'Aaron Hernandez'  1 2012   12.4   8.755556 
       16 'Aaron Hernandez'  6 2012   9.0   8.755556 
       17 'Aaron Hernandez'  7 2012   5.4   8.755556 
       18 'Aaron Hernandez' 12 2012   3.6   8.755556 
       19 'Aaron Hernandez' 13 2012   9.7   8.755556 
       20 'Aaron Hernandez' 14 2012   17.8   8.755556", 
       header = TRUE, stringsAsFactors = FALSE)