2017-02-21 79 views
1

嗨〜我正在处理我的数据。熊猫条件语句问题

我想用条件语句提取数据

这是我的代码。

# -*- coding: utf-8 -*- 
import pandas as pd 
import numpy as np 
import os 

join_file = r'D:\handling data\complete data\조인\after_join.csv' 
pwd = os.getcwd() 
os.chdir(os.path.dirname(join_file)) 
join_data = pd.read_csv(os.path.basename(join_file), sep=',', encoding='utf-8') 

print(join_data.head()) 

enter image description here

join_data['cluster_z'] = 4 # 둘다 하락세   
join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4) 
        & (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 1 # 다 상승세 

join_data['cluster_z'][((join_data['cluster_x'] == 1 | join_data['cluster_x'] == 5) 
        & (join_data['cluster_y'] == 3 | join_data['cluster_y'] == 1))] = 2 # 전체 하락세, 점포당 상승세 

join_data['cluster_z'][((join_data['cluster_x'] == 3 | join_data['cluster_x'] == 2 | join_data['cluster_x'] == 4) 
        & (join_data['cluster_y'] == 2 | join_data['cluster_y'] == 4))] = 3 # 전체 상승세, 점파당 하락세 

print(join_data.head()) 

和执行第二打印后(join_data.head())。 我喜欢的图片

enter image description here

我怎样才能解决这个问题的错误? 提前致谢。

回答

2

看来你省去了很多括号的条件之间,也能更好的是使用loc

原文:

join_data['cluster_z'] 
[((join_data['cluster_x'] == 3 | 
    join_data['cluster_x'] == 2 | 
    join_data['cluster_x'] == 4) & 
    (join_data['cluster_y'] == 3 | 
    join_data['cluster_y'] == 1))] = 1 

更改为:

join_data.loc[ 
((join_data['cluster_x'] == 3) | 
(join_data['cluster_x'] == 2) | 
(join_data['cluster_x'] == 4)) & 
((join_data['cluster_y'] == 3) | 
(join_data['cluster_y'] == 1)), 'cluster_z'] = 1 

或者更好地利用isin

join_data.loc[ 
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1 

一起:

join_data = pd.DataFrame({'cluster_x':[3,2,5,3], 
         'cluster_y':[3,0,1,2]}) 

print (join_data) 
    cluster_x cluster_y 
0   3   3 
1   2   0 
2   5   1 
3   3   2 

join_data['cluster_z'] = 4 

join_data.loc[ 
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 1 

join_data.loc[ 
(join_data['cluster_x'].isin([1,5])) & 
(join_data['cluster_y'].isin([3,1])), 'cluster_z'] = 2 

join_data.loc[ 
(join_data['cluster_x'].isin([3,2,4])) & 
(join_data['cluster_y'].isin([2,4])), 'cluster_z'] = 3 

print (join_data) 
    cluster_x cluster_y cluster_z 
0   3   3   1 
1   2   0   4 
2   5   1   2 
3   3   2   3 

或者更可读:

mask1 = join_data['cluster_x'].isin([3,2,4]) 
mask2 = join_data['cluster_y'].isin([3,1]) 
mask3 = join_data['cluster_x'].isin([1,5]) 
mask4 = join_data['cluster_y'].isin([2,4]) 

join_data['cluster_z'] = 4 
join_data.loc[mask1 & mask2 , 'cluster_z'] = 1 
join_data.loc[mask3 & mask2 , 'cluster_z'] = 2 
join_data.loc[mask1 & mask4 , 'cluster_z'] = 3 

print (join_data) 
    cluster_x cluster_y cluster_z 
0   3   3   1 
1   2   0   4 
2   5   1   2 
3   3   2   3 

解决方案与多个numpy.where

mask1 = join_data['cluster_x'].isin([3,2,4]) 
mask2 = join_data['cluster_y'].isin([3,1]) 
mask3 = join_data['cluster_x'].isin([1,5]) 
mask4 = join_data['cluster_y'].isin([2,4]) 

join_data['cluster_z'] = np.where(mask1 & mask2, 1, 
         np.where(mask3 & mask2, 2, 
         np.where(mask1 & mask4, 3, 4)))   

print (join_data) 
    cluster_x cluster_y cluster_z 
0   3   3   1 
1   2   0   4 
2   5   1   2 
3   3   2   3 
+0

谢谢~~你这么大的家伙! 有很多方法来处理它。哈哈。 你怎么知道很多方法。谢谢~~ 有一个美好的一天~~ –