2012-04-10 81 views
1

功能我有一系列的足球成绩,并希望找出一个团队有多少个游戏如何创建plyr

这里特定号码打进的一个子集与累积的头因为最新的结果在一个赛季分的得分

我一直手腕= -slapped几次不使用dput长度为

allData <- structure(list(team = c("Arsenal", "Tottenham H", "Tottenham H", 
"Arsenal", "Arsenal", "Tottenham H"), venue = c("H", "A", "H", 
"A", "H", "A"), result = c("W", "D", "W", "L", "W", "D"), GF = c(1L, 
0L, 3L, 1L, 3L, 0L), GA = c(0L, 0L, 1L, 2L, 0L, 0L), gameDate = structure(c(1333868400, 
1333782000, 1333263600, 1333177200, 1332572400, 1332572400), class = c("POSIXct", 
"POSIXt"), tzone = ""), season = structure(c(2L, 2L, 2L, 2L, 
2L, 2L), .Label = c("2010/2011", "2011/2012"), class = "factor"), 
points = c(3, 1, 3, 0, 3, 1), GD = c(1L, 0L, 2L, -1L, 3L, 
0L), cumpts = c(3, 1, 4, 3, 6, 5)), .Names = c("team", "venue", 
"result", "GF", "GA", "gameDate", "season", "points", "GD", "cumpts" 
), row.names = c(NA, 6L), class = "data.frame") 

因此承担,这里是在一个赛季一支球队数据

spurs <- structure(list(team = c("Tottenham H", "Tottenham H", "Tottenham H", 
"Tottenham H", "Tottenham H", "Tottenham H", "Tottenham H", "Tottenham H", 
"Tottenham H", "Tottenham H", "Tottenham H", "Tottenham H", "Tottenham H", 
"Tottenham H", "Tottenham H", "Tottenham H", "Tottenham H", "Tottenham H", 
"Tottenham H", "Tottenham H", "Tottenham H", "Tottenham H", "Tottenham H", 
"Tottenham H", "Tottenham H", "Tottenham H", "Tottenham H", "Tottenham H", 
"Tottenham H", "Tottenham H", "Tottenham H", "Tottenham H"), 
    venue = c("A", "H", "A", "H", "A", "H", "A", "H", "A", "H", 
    "A", "H", "H", "H", "A", "A", "H", "H", "A", "H", "A", "H", 
    "A", "H", "A", "A", "H", "A", "H", "A", "H", "A"), result = c("D", 
    "W", "D", "D", "L", "L", "L", "W", "D", "W", "L", "D", "W", 
    "W", "D", "W", "D", "W", "L", "W", "W", "W", "W", "W", "W", 
    "D", "W", "W", "W", "W", "L", "L"), GF = c(0L, 3L, 0L, 1L, 
    0L, 1L, 2L, 5L, 0L, 3L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 
    3L, 3L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 4L, 2L, 1L, 0L), GA = c(0L, 
    1L, 0L, 1L, 1L, 3L, 5L, 0L, 0L, 1L, 3L, 1L, 0L, 0L, 1L, 0L, 
    1L, 0L, 2L, 0L, 1L, 0L, 1L, 1L, 1L, 2L, 1L, 1L, 0L, 0L, 5L, 
    3L), gameDate = structure(c(1333782000, 1333263600, 1332572400, 
    1332313200, 1331366400, 1330848000, 1330243200, 1328947200, 
    1328515200, 1327996800, 1327219200, 1326528000, 1326268800, 
    1325577600, 1325318400, 1324972800, 1324540800, 1324281600, 
    1323590400, 1322899200, 1322294400, 1321862400, 1320562800, 
    1319958000, 1319353200, 1318748400, 1317538800, 1316847600, 
    1316329200, 1315638000, 1314514800, 1313996400), class = c("POSIXct", 
    "POSIXt"), tzone = ""), season = structure(c(2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2010/2011", 
    "2011/2012"), class = "factor"), points = c(1, 3, 1, 1, 0, 
    0, 0, 3, 1, 3, 0, 1, 3, 3, 1, 3, 1, 3, 0, 3, 3, 3, 3, 3, 
    3, 1, 3, 3, 3, 3, 0, 0), GD = c(0L, 2L, 0L, 0L, -1L, -2L, 
    -3L, 5L, 0L, 2L, -1L, 0L, 2L, 1L, 0L, 2L, 0L, 1L, -1L, 3L, 
    2L, 2L, 2L, 2L, 1L, 0L, 1L, 1L, 4L, 2L, -4L, -3L), cumpts = c(1, 
    4, 5, 6, 6, 6, 6, 9, 10, 13, 13, 14, 17, 20, 21, 24, 25, 
    28, 28, 31, 34, 37, 40, 43, 46, 47, 50, 53, 56, 59, 59, 59 
    )), .Names = c("team", "venue", "result", "GF", "GA", "gameDate", 
"season", "points", "GD", "cumpts"), row.names = c(NA, -32L), class = "data.frame") 

我则对马刺据帧此代码来计算在特定的游戏长度得分(这里5)点

gameLength <- 5 
seasonLength <- nrow(spurs) 
cumPoints <- c() 
cumPoints[1] <- spurs[gameLength,]$cumpts 
for (i in gameLength+1:seasonLength) { 
cumPoints[i-(gameLength-1)] <- ((spurs[i,]$cumpts)- 
(spurs[i-gameLength,]$cumpts)) 
} 
cumPoints <- cumPoints[!is.na(cumPoints)] # not sure why throws up NAs 

这将产生正确的输出

[1] 6 5 2 4 4 7 7 8 8 10 8 11 11 11 8 10 10 12 12 15 15 
[22] 13 13 13 13 13 12 9 

但我需要能够使用包含此数据的列在数据框中为每个季节和团队转换allData。

我假设我应该ddply以某种方式使用,除非有更好的选择

+0

对于(我在gameLength + 1:seasonLength){'与游戏长度有关?它只是在除了第4个游戏之外的所有游戏上运行。数据游戏长度在哪里 – 2012-04-10 18:38:07

+0

gameLength是我感兴趣的游戏的运行。所以在这里我对团队在5场比赛中获得的积分感兴趣伸展。马刺队最近5场比赛共获得1,3,1,1,0分,总共6分。他们之前输的0分代替了1,而cumPoints [2]是5分等。 – pssguy 2012-04-10 19:06:05

+0

运算符优先级为“:”与“ +“会每次都用这样的代码让你起来(for gameLength + 1:seasonLength)。你需要弄清楚使用配对的parens来避免这个问题的做法。 – 2012-04-10 21:43:30

回答

1

要复制你的输出:

library(zoo) 
rollapply(spurs$GD, gamelength, sum) 

如果你有ALLDATA看起来像马刺data.frame ..

rollsum <- function(df, gamelen=gamelength) { 
    require(zoo) 
    out <- rollapply(df$points, gamelen, sum) 

    return(out) 
} 

ddply(allData, .(team), rollsum) 
+0

谢谢贾斯汀。这似乎让我走上了正确的道路。我想按照球队和赛季进行分组,并且这会产生一个错误“list_to_dataframe(res,attr(.data,”split_labels“)中的错误): 结果的长度不相等”随着赛季长度的变化,这是正确的。解决方案似乎是使用dlply并做一些黑客来创建数据框。可能是更好的方法,但似乎工作 – pssguy 2012-04-11 03:36:22