2017-02-17 61 views
2

,我有以下数据熊猫数据框中找到第一次出现,如果条件满足

 timestamp  bucket forward 
    0 02/01/2012 08:00 1 2309.6 
1156 02/01/2012 08:00 2 2305.9 
2320 02/01/2012 08:00 3 2306 
3481 02/01/2012 08:00 4 2240.9 
4643 02/01/2012 08:00 5 2235.3 
5807 02/01/2012 08:00 6 2224.1 
6969 02/01/2012 08:00 7 2167.1 
    1 02/01/2012 09:00 1 2327.3 
1157 02/01/2012 09:00 2 2323.4 
2321 02/01/2012 09:00 3 2323.5 
3482 02/01/2012 09:00 4 2258.4 
4644 02/01/2012 09:00 5 2252.8 
5808 02/01/2012 09:00 6 2241.4 
6970 02/01/2012 09:00 7 2183.2 
    2 02/01/2012 10:00 1 2342.3 

如果桶>上10桶,我需要找到具有相同时间戳对应的前进,即:

 timestamp  bucket forward result 
    0 02/01/2012 08:00 1 2309.6 2309.6 
1156 02/01/2012 08:00 2 2305.9 2309.6 
2320 02/01/2012 08:00 3 2306  2309.6 
3481 02/01/2012 08:00 4 2240.9 2309.6 
4643 02/01/2012 08:00 5 2235.3 2309.6 
5807 02/01/2012 08:00 6 2224.1 2309.6 
6969 02/01/2012 08:00 7 2167.1 2309.6 
    1 02/01/2012 09:00 1 2327.3 2327.3 
1157 02/01/2012 09:00 2 2323.4 2327.3 
2321 02/01/2012 09:00 3 2323.5 2327.3 
3482 02/01/2012 09:00 4 2258.4 2327.3 
4644 02/01/2012 09:00 5 2252.8 2327.3 
5808 02/01/2012 09:00 6 2241.4 2327.3 
6970 02/01/2012 09:00 7 2183.2 2327.3 
    2 02/01/2012 10:00 1 2342.3 2342.3 

到目前为止,我有:

df['result'] = np.where(df['bucket'].diff()>0, df['forward'].shift(1), df['forward']) 

不知道如何将第一次出现在桶部分。任何指针,将不胜感激

回答

2

这是一种方法。

通过与先前值进行比较填充值,然后ffill,NaN值。

In [1024]: df['result'] = df.loc[~(df.bucket > df.bucket.shift(1)), 'forward'] 

In [1025]: df 
Out[1025]: 
       timestamp bucket forward result 
0 '02/01/2012 08:00'  1 2309.6 2309.6 
1156 '02/01/2012 08:00'  2 2305.9  NaN 
2320 '02/01/2012 08:00'  3 2306.0  NaN 
3481 '02/01/2012 08:00'  4 2240.9  NaN 
4643 '02/01/2012 08:00'  5 2235.3  NaN 
5807 '02/01/2012 08:00'  6 2224.1  NaN 
6969 '02/01/2012 08:00'  7 2167.1  NaN 
1 '02/01/2012 09:00'  1 2327.3 2327.3 
1157 '02/01/2012 09:00'  2 2323.4  NaN 
2321 '02/01/2012 09:00'  3 2323.5  NaN 
3482 '02/01/2012 09:00'  4 2258.4  NaN 
4644 '02/01/2012 09:00'  5 2252.8  NaN 
5808 '02/01/2012 09:00'  6 2241.4  NaN 
6970 '02/01/2012 09:00'  7 2183.2  NaN 
2 '02/01/2012 10:00'  1 2342.3 2342.3 

正向填充NaN小号

In [1026]: df.result = df.result.ffill() 

In [1027]: df 
Out[1027]: 
       timestamp bucket forward result 
0 '02/01/2012 08:00'  1 2309.6 2309.6 
1156 '02/01/2012 08:00'  2 2305.9 2309.6 
2320 '02/01/2012 08:00'  3 2306.0 2309.6 
3481 '02/01/2012 08:00'  4 2240.9 2309.6 
4643 '02/01/2012 08:00'  5 2235.3 2309.6 
5807 '02/01/2012 08:00'  6 2224.1 2309.6 
6969 '02/01/2012 08:00'  7 2167.1 2309.6 
1 '02/01/2012 09:00'  1 2327.3 2327.3 
1157 '02/01/2012 09:00'  2 2323.4 2327.3 
2321 '02/01/2012 09:00'  3 2323.5 2327.3 
3482 '02/01/2012 09:00'  4 2258.4 2327.3 
4644 '02/01/2012 09:00'  5 2252.8 2327.3 
5808 '02/01/2012 09:00'  6 2241.4 2327.3 
6970 '02/01/2012 09:00'  7 2183.2 2327.3 
2 '02/01/2012 10:00'  1 2342.3 2342.3 
2

可以从柱创建一组变量与diffcumsum,然后采取从每个组与第一前进值变换

df['result'] = df.groupby(by = (df.bucket.diff() < 0).cumsum())['forward'].transform('first') 
df 

enter image description here

相关问题