2017-07-18 165 views
0

我想创建一个新的变量,如果发生事件,那么我想回顾所有先前基于时间变量1的事件。我有一些下面的示例数据。我很迷茫,不知道该从哪里开始。基于R中的滞后观察值创建一个变量

event<-c("Dribble","Pass","Dribble","Bad Shot","Shot Miss","Rebound","Pass","Pump Fake","Good Shot","Shot Miss") 
time<-c(1,2,3,4,5,6,6.5,6.9,6.92,6.95) 
player_id<-c(1,1,2,2,2,1,1,2,2,2) 
pass_to_shot<-c("","Pass to Shot","","","","","Pass to Shot","","","") 
test_data<-data.frame(player_id,event,time,pass_to_shot) 

player_id event time pass_to_short 
    1  Dribble  1 NA  
    1  Pass  2 Pass to Shot 
    2  Dribble  3 NA 
    2  Bad Shot 4 NA 
    2  Shot Miss 5 NA 
    1  Rebound  6 NA 
    1  Pass  6.5 Pass to Shot 
    2  Pump Fake 6.9 NA 
    2  Good Shot 6.92 NA 

我想它是这个样子:

player_id event time pass_to_short chance_create 
    1  Dribble  1 NA  
    1  Pass  2 Pass to Shot 
    2  Dribble  3 NA 
    2  Bad Shot 4 NA 
    2  Shot Miss 5 NA 
    1  Rebound  6 NA 
    1  Pass  6.5 Pass to Shot   1 
    2  Pump Fake 6.9 NA 
    2  Good Shot 6.92 NA 

我真的不明白如何引用过去的观察中R数据集。基本上如果event ==“Pass”并且在接下来的1秒内有一个“Good Shot”事件(单位为时间),那么我希望chance_create等于1.任何帮助都会很棒,谢谢!

回答

0

你可能在你dplyr

library(dplyr) 
test_data %>% mutate(event_of_interest = ifelse(event == "Pass" | event == "GoodShot",1,0), 
       time_diff = c(diff(-time),NA), 
       chance_create = ifelse(abs(time_diff) < 1 & event_of_interest == 1,1,0))%>% 
       select(-event_of_interest,-time_diff) 

输出:

  player_id  event time pass_to_shot chance_create 
     1   1 Dribble 1.00       0 
     2   1  Pass 2.00 Pass to Shot    0 
     3   2 Dribble 3.00       0 
     4   2 Bad Shot 4.00       0 
     5   2 Shot Miss 5.00       0 
     6   1 Rebound 6.00       0 
     7   1  Pass 6.50 Pass to Shot    1 
     8   2 Pump Fake 6.90       0 
     9   2 Good Shot 6.92       0 
     10   2 Shot Miss 6.95       0 

虽然我不是100%肯定,如果我的代码是强大的,即,我不知道这是否会永远给出想要的结果。

0

这里是另一种解决方案可能是一个小更强劲,但很难说与当前的数据:

library(dplyr) 
test_data %>% 
    filter(event %in% c("Pass", "Good Shot")) %>% 
    arrange(time, event) %>% 
    mutate(chance_create = ifelse((time - lead(time)) < 1 & lead(event) == "Good Shot", 1, NA)) %>% 
    select(player_id, chance_create, time) %>% 
    left_join(test_data, ., by = c("time", "player_id")) 
0
z1 <- test_data %>% filter(event == "Pass" | event == "Good Shot") %>% 
    mutate(time_diff = c(diff(time), NA), 
     chance_create = ifelse(event == "Pass" & lead(event) == "Good Shot" & time_diff <= 1, 1, 0)) %>% 
    select(-time_diff) 

output <- merge(test_data, z1, by = c("player_id", "event", "time", "pass_to_shot"), all.x = T) %>% 
    arrange(time) 
output$chance_create[is.na(output$chance_create)] <- 0 
output 

    player_id  event time pass_to_shot chance_create 
      1 Dribble 1.00       0 
      1  Pass 2.00 Pass to Shot    0 
      2 Dribble 3.00       0 
      2 Bad Shot 4.00       0 
      2 Shot Miss 5.00       0 
      1 Rebound 6.00       0 
      1  Pass 6.50 Pass to Shot    1 
      2 Pump Fake 6.90       0 
      2 Good Shot 6.92       0 
      2 Shot Miss 6.95       0 
相关问题