2016-10-17 31 views
0

代码,以使测试数据:大熊猫子集使用切片布尔索引

import pandas as pd 
import numpy as np 

testdf = {'date': range(10), 
     'event': ['A', 'A', np.nan, 'B', 'B', 'A', 'B', np.nan, 'A', 'B'], 
     'id': [1] * 7 + [2] * 3} 
testdf = pd.DataFrame(testdf) 

print(testdf) 

给出

date event id 
0  0  A 1 
1  1  A 1 
2  2 NaN 1 
3  3  B 1 
4  4  B 1 
5  5  A 1 
6  6  B 1 
7  7 NaN 2 
8  8  A 2 
9  9  B 2 

子集testdf

df_sub = testdf.loc[testdf.event == 'A',:] 
print(df_sub) 
    date event id 
0  0  A 1 
1  1  A 1 
5  5  A 1 
8  8  A 2 

(注:不重新索引)

创造条件布尔指数

bool_sliced_idx1 = df_sub.date < 4 
bool_sliced_idx2 = (df_sub.date > 4) & (df_sub.date < 6) 

我想插入使用原来的DF这一新指数的条件值,比如

dftest[ 'new_column'] = np.nan 
dftest.loc[bool_sliced_idx1, 'new_column'] = 'new_conditional_value' 

这显然(现在)给出了错误:

pandas.core.indexing.IndexingError: Unalignable boolean Series key provided 

bool_sliced_idx1看起来像

>>> print(bool_sliced_idx1) 
0  True 
1  True 
5 False 
8 False 
Name: date, dtype: bool 

我试过testdf.ix[(bool_sliced_idx1==True).index,:],但是,这并不工作,因为

>>> (bool_sliced_idx1==True).index 
Int64Index([0, 1, 5, 8], dtype='int64') 

回答

3

IIUC,你可以结合你的所有条件一次,而不是试图链它们。例如,df_sub.date < 4实际上只是(testdf.event == 'A') & (testdf.date < 4)。所以,你可以这样做:

# Create the conditions. 
cond1 = (testdf.event == 'A') & (testdf.date < 4) 
cond2 = (testdf.event == 'A') & (testdf.date.between(4, 6, inclusive=False)) 

# Make the assignments. 
testdf.loc[cond1, 'new_col'] = 'foo' 
testdf.loc[cond2, 'new_col'] = 'bar' 

这将使你:

date event id new_col 
0  0  A 1  foo 
1  1  A 1  foo 
2  2 NaN 1  NaN 
3  3  B 1  NaN 
4  4  B 1  NaN 
5  5  A 1  bar 
6  6  B 1  NaN 
7  7 NaN 2  NaN 
8  8  A 2  NaN 
9  9  B 2  NaN 
+0

是这就是正确的,因为如果你不把它们连,并从df_sub的条件下,你的布尔指数不会有长度相同,所以会出现错误! – Manuel

0

这个工作

idx = np.where(bool_sliced_idx1==True)[0] 
## or 
# np.ravel(np.where(bool_sliced_idx1==True)) 

idx_original = df_sub.index[idx] 
testdf.iloc[idx_original,:]