大熊猫子集使用切片布尔索引

代码，以使测试数据：大熊猫子集使用切片布尔索引

import pandas as pd 
import numpy as np 

testdf = {'date': range(10), 
     'event': ['A', 'A', np.nan, 'B', 'B', 'A', 'B', np.nan, 'A', 'B'], 
     'id': [1] * 7 + [2] * 3} 
testdf = pd.DataFrame(testdf) 

print(testdf)

给出

date event id 
0  0  A 1 
1  1  A 1 
2  2 NaN 1 
3  3  B 1 
4  4  B 1 
5  5  A 1 
6  6  B 1 
7  7 NaN 2 
8  8  A 2 
9  9  B 2

子集testdf

df_sub = testdf.loc[testdf.event == 'A',:] 
print(df_sub) 
    date event id 
0  0  A 1 
1  1  A 1 
5  5  A 1 
8  8  A 2

（注：不重新索引）

创造条件布尔指数

bool_sliced_idx1 = df_sub.date < 4 
bool_sliced_idx2 = (df_sub.date > 4) & (df_sub.date < 6)

我想插入使用原来的DF这一新指数的条件值，比如

dftest[ 'new_column'] = np.nan 
dftest.loc[bool_sliced_idx1, 'new_column'] = 'new_conditional_value'

这显然（现在）给出了错误：

pandas.core.indexing.IndexingError: Unalignable boolean Series key provided

bool_sliced_idx1看起来像

>>> print(bool_sliced_idx1) 
0  True 
1  True 
5 False 
8 False 
Name: date, dtype: bool

我试过testdf.ix[(bool_sliced_idx1==True).index,:]，但是，这并不工作，因为

>>> (bool_sliced_idx1==True).index 
Int64Index([0, 1, 5, 8], dtype='int64')

来源

2016-10-17 muon

IIUC，你可以结合你的所有条件一次，而不是试图链它们。例如，df_sub.date < 4实际上只是(testdf.event == 'A') & (testdf.date < 4)。所以，你可以这样做：

# Create the conditions. 
cond1 = (testdf.event == 'A') & (testdf.date < 4) 
cond2 = (testdf.event == 'A') & (testdf.date.between(4, 6, inclusive=False)) 

# Make the assignments. 
testdf.loc[cond1, 'new_col'] = 'foo' 
testdf.loc[cond2, 'new_col'] = 'bar'

这将使你：

date event id new_col 
0  0  A 1  foo 
1  1  A 1  foo 
2  2 NaN 1  NaN 
3  3  B 1  NaN 
4  4  B 1  NaN 
5  5  A 1  bar 
6  6  B 1  NaN 
7  7 NaN 2  NaN 
8  8  A 2  NaN 
9  9  B 2  NaN

来源

2016-10-17 22:15:46 root

是这就是正确的，因为如果你不把它们连，并从df_sub的条件下，你的布尔指数不会有长度相同，所以会出现错误！ – Manuel

这个工作

idx = np.where(bool_sliced_idx1==True)[0] 
## or 
# np.ravel(np.where(bool_sliced_idx1==True)) 

idx_original = df_sub.index[idx] 
testdf.iloc[idx_original,:]

来源

2016-10-17 22:14:07 muon

大熊猫子集使用切片布尔索引

回答

相关问题