2017-10-11 61 views
1

均值上限的数值我在为r的下列数据帧采取滞后和dplyr

name date   month year  hours 
    SSI  01-01-2016 01  2016  2000 
    SSI  02-01-2016 01  2016  1900 
    SSI  03-01-2016 01  2016  2038 
    SSI  04-01-2016 01  2016  2041 
    SSII 01-01-2016 01  2016  2000 
    SSII 02-01-2016 01  2016  2100 
    SSII 03-01-2016 01  2016  2105 
    SSII 04-01-2016 01  2016  2203 

我想计算lag of hours为每名group by个月和year.Which我可以用下面的代码

df1 <- df %>% 
    group_by(name,year,month) %>% 
    mutate(running_hrs = hours- lag(hours)) %>% 
    as.data.frame() 

我想要的是哪里running_hrs大于24或小于0,我想用这个月的平均值来限制这些值。我正在做下面的事情。

new_df <- df%>% 
    group_by(name,year,month) %>% 
    mutate(running_hrs = hours- lag(hours)) %>% 
    mutate(running_hrs_new = ifelse(running_hrs > 24 | running_hrs < 0,mean(running_hrs),running_hrs)) %>% 
    as.data.frame() 

    name date   month year hours running_hrs running_hrs_new 
    SSI  01-01-2016 01  2016 2000  NA   
    SSI  02-01-2016 01  2016 1900  -100   (3/4) 
    SSI  03-01-2016 01  2016 2038  138   (3/4) 
    SSI  04-01-2016 01  2016 2041  3    3 
    SSII 01-01-2016 01  2016 2000  NA   
    SSII 02-01-2016 01  2016 2100  100   (10/4) 
    SSII 03-01-2016 01  2016 2105  5    5 
    SSII 04-01-2016 01  2016 2110  5    5 

值应该由小于24且大于或等于零的运行小时数的平均值代替。我认为我们可以使用条件意思

回答

1

希望这有助于!

library(dplyr) 
library(tidyr) 

new_df <- df%>% 
    group_by(name,year,month) %>% 
    mutate(running_hrs = hours- lag(hours)) %>% 
    mutate(valid_running_hrs= ifelse(running_hrs < 24 & running_hrs > 0,running_hrs,0)) %>% 
    replace_na(list(valid_running_hrs=0)) %>% 
    group_by(name,year,month) %>% 
    mutate(running_hrs_new = ifelse(running_hrs > 24 | running_hrs < 0, mean(valid_running_hrs), running_hrs)) %>% 
    as.data.frame()