2016-08-04 43 views
1

我有一个数据框,具有不同的帐户和输赢记录。我想统计一个人连续失去了多少次。将“计数”列添加到具有一定条件的数据框中

df <- data.frame(account_number =c(1,1,1,1,1,1,1,2,2,2,2,2,3,3), 
       win_lose = c(-1,-1,-1,1,-1,-1,-1,-1,-1,1,1,1,1,-1)) 

> df 
     account_number win_lose 
1    1  -1 
2    1  -1 
3    1  -1 
4    1  1 
5    1  -1 
6    1  -1 
7    1  -1 
8    2  -1 
9    2  -1 
10    2  1 
11    2  1 
12    2  1 
13    3  1 
14    3  -1 

每个帐户都代表一个人。最终的结果应该是这样的

  account_number win_lose losing_streak 
    1    1  -1    1 
    2    1  -1    2 
    3    1  -1    3 
    4    1  1    0 
    5    1  -1    1 
    6    1  -1    2 
    7    1  -1    3 
    8    2  -1    1 
    9    2  -1    2 
    10    2  1    0 
    11    2  1    0 
    12    2  1    0 
    13    3  1    0 
    14    3  -1    1 

回答

2

一种选择是从data.tablerleid。将'data.frame'转换为'data.table'(setDT(df)),按'account_number and rleid of 'win_lose'分组,我们得到行序列(seq_len(.N))乘以'win_lose < 0',以便强制所有FALSE值为0,并且将乘以在0和TRUE将被强制为1,我们通过用1.

library(data.table) 
setDT(df)[, losing_streak := seq_len(.N) * (win_lose <0) , 
         by = .(account_number, rleid(win_lose))] 
df 
# account_number win_lose losing_streak 
# 1:    1  -1    1 
# 2:    1  -1    2 
# 3:    1  -1    3 
# 4:    1  1    0 
# 5:    1  -1    1 
# 6:    1  -1    2 
# 7:    1  -1    3 
# 8:    2  -1    1 
# 9:    2  -1    2 
#10:    2  1    0 
#11:    2  1    0 
#12:    2  1    0 
#13:    3  1    0 
#14:    3  -1    1 

base R选项相乘获得该序列值将是使用ave(用于通过组)和与rle

with(df, ave(win_lose, account_number, FUN = 
    function(x) with(rle(x== -1), sequence(lengths) * rep(values, lengths)))) 
#[1] 1 2 3 0 1 2 3 1 2 0 0 0 0 1 
相关问题