您可以比较DataFrame
与max
值由eq
:
print (tmp[tmp.eq(tmp.max(axis=1), axis=0)])
mask = (tmp.eq(tmp.max(axis=1), axis=0))
print (mask)
A B
0 False True
1 False True
2 True False
3 False True
4 False True
5 False True
6 True True
7 True False
8 False True
9 True True
df = (tmp[mask])
print (df)
A B
0 NaN 3.0
1 NaN 4.0
2 3.0 NaN
3 NaN 33.0
4 NaN 10.0
5 NaN 9.0
6 7.0 7.0
7 8.0 NaN
8 NaN 10.0
9 10.0 10.0
,然后你可以添加NaN
如果列中的值相等:
mask = (tmp.eq(tmp.max(axis=1), axis=0))
mask['B'] = mask.B & (tmp.A != tmp.B)
print (mask)
A B
0 False True
1 False True
2 True False
3 False True
4 False True
5 False True
6 True False
7 True False
8 False True
9 True False
df = (tmp[mask])
print (df)
A B
0 NaN 3.0
1 NaN 4.0
2 3.0 NaN
3 NaN 33.0
4 NaN 10.0
5 NaN 9.0
6 7.0 NaN
7 8.0 NaN
8 NaN 10.0
9 10.0 NaN
计时(len(df)=10
):
In [234]: %timeit (tmp[tmp.eq(tmp.max(axis=1), axis=0)])
1000 loops, best of 3: 974 µs per loop
In [235]: %timeit (gh(tmp))
The slowest run took 4.32 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.64 ms per loop
(len(df)=100k
):
In [244]: %timeit (tmp[tmp.eq(tmp.max(axis=1), axis=0)])
100 loops, best of 3: 7.42 ms per loop
In [245]: %timeit (gh(t1))
1 loop, best of 3: 8.81 s per loop
代码时序:
import pandas as pd
tmp= pd.DataFrame({
'A': pd.Series([1,2,3,4,5,6,7,8,9,10], index=range(0,10)),
'B': pd.Series([3,4,1,33,10,9,7,3,10,10], index=range(0,10))
})
tmp = pd.concat([tmp]*10000).reset_index(drop=True)
t1 = tmp.copy()
print (tmp[tmp.eq(tmp.max(axis=1), axis=0)])
def top(row):
data = row.tolist()
return [d if d == max(data) else None for d in data]
def gh(tmp1):
return tmp1.apply(top, axis=1)
print (gh(t1))
我曾在我的脑海里完全一样:'TMP [tmp.eq(tmp.max(轴= 1),轴= 0)]':) –
谢谢你们!非常感激! :) – Ruslan