2017-03-02 91 views
0

我正在寻找解决方案来计算拖欠桶。我已经想出了重设cumsum的部分,但我坚持如何基于触发器“延迟”cumsum;看到我的,我想做到哪里我期望的结果是correct_bucket什么例子:Cumsum重置和延迟

df <- data.frame(id = c(1,1,1,1,2,2,3,3,3,3,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,7,7,7,7,7,8,8,8,8), 
      min_due = c(25,50,50,75,25,50,25,50,25,25,25,50,75,100,25,50,75,100,100,25,50,25,14.99,0,25,60,60,0,25,50,75,100,75,25,50,25,50), 
      payment = c(0,0,25,0,0,0,0,0,50,25,0,0,0,0,0,0,0,0,25,100,0,150,25,14.99,0,25,60,60,0,0,0,0,50,0,0,25,0), 
      past_due_amt = c(0,25,25,50,0,25,0,25,0,0,0,25,50,75,0,25,50,75,75,0,25,0,0,0,0,0,0,0,0,25,50,75,50,0,25,0,25), 
      correct_bucket = c(0,1,1,2,0,1,0,1,0,0,0,1,2,3,0,1,2,3,3,0,1,0,0,0,0,0,0,0,0,1,2,3,2,0,1,0,1)) 

correct_bucket的说明:这表明,通过ID,该min_due被满足(或不能)由支付地为大于或等于先前(滞后1)min_pay。例如:ID#1的min_due为25(在第1行),付款为0(第2行),因此correct_bucket = 1.正如您所见,在每个示例中,正确存储桶的值需要迭代取决于付款是否已付款以及付款金额。

想法?请询问您需要的任何澄清问题,我近在咫尺,欢迎任何额外的帮助!

谢谢!

+0

对不起,我听不懂你的问题。删除答案 – akrun

+0

无赖,很遗憾地浪费你的时间 –

+0

没关系。可能有人能比我更好地理解你的问题 – akrun

回答

1
df$original_order = 1:nrow(df) #In case you need later. OPTIONAL 

#Obtain the incremental min_due for each id 
df$b2 = unlist(lapply(split(df, df$id), function(a) c(0, diff(a$min_due)))) 

#Function to get your values from incremental min_due 
ff = function(x){ 
x$b3 = 0 
    for (i in 2:NROW(x)){ 
     if (x$b2[i] > 0){ 
      x$b3[i] = x$b3[i-1] + 1 
     } 
     if (x$b2[i] == 0){ 
      x$b3[i] = x$b3[i-1] 
     } 
     if (x$b2[i] < 0){ 
      x$b3[i] = 0 
     } 
    } 
    return(x) 
} 

#Split df by id and use the above function on each sub group 
#'b3' is the value you want 
do.call(rbind, lapply(split(df, df$id), function(a) ff(a))) 

新FF

ff = function(x){ 
    x$b3 = 0 

    if(NROW(x) < 2){ 
     return(x) 
    } 

    for (i in 2:NROW(x)){ 
     if (x$b2[i] > 0){ 
      x$b3[i] = x$b3[i-1] + 1 
     } 
     if (x$b2[i] == 0){ 
      x$b3[i] = x$b3[i-1] 
     } 
     if (x$b2[i] < 0){ 
      x$b3[i] = 0 
     } 
    } 
    return(x) 
} 
+1

这就是我想到的! (我是一名培训的SAS程序员),但我不知道如何在这里完成。但是,当我在我的实际数据集上运行时出现此错误:if(x $ b2 [i]> 0)中的错误{:缺少TRUE/FALSE所需的值。我在b2中没有任何NA,所以我有点困惑。有什么想法吗? –

+0

这是数字,我只是一个白痴?总结(df $ b2)不会产生缺失值,我桌子的其余部分也不会有任何缺失;表(is.na(DF))。 –

+0

anyNA(df $ b2) [1] FALSE –