2017-05-28 172 views
0

我将excel中的数据加载到熊猫数据框中。我现在只希望仅选择其ASSESSMENT ID是每个APPID的最大ASSESSMENT ID以及该APPID的所有UI SEQ ID的行。根据python熊猫中的2列选择DF中的特定行

APPID APPNAME ASSESSMENT ID UI SEQ NUMBER QUESTION ANSWER TEXT . 
1 appname 2493 11 Question No . 
1 appname 13808 11 Question Ctry of domicile . 
1 appname 13808 11 Question Name . 
1 appname 35316 11 Question Ctry of domicile .  
1 appname 35316 11 Question Name . 
1 appname 35316 11 Question Nationality .  
1 appname 2493 12 Question Corp name . 
1 appname 2493 12 Question Cr Br Scr . 
1 appname 2493 12 Question Inc And Assests . 
1 appname 2493 12 Question Int, Ext Reg Reports . 
1 appname 13808 12 Question Corp name . 
1 appname 35316 12 Question Corp name . 
1 appname 2493 13 Question No . 
1 appname 13808 13 Question No . 
1 appname 35316 13 Question No . 
1 appname 2493 14 Question No . 
1 appname 13808 14 Question firms Pos . 
1 appname 35316 14 Question firms Pos . 

其结果将是

APPID APPNAME ASSESSMENT ID UI SEQ NUMBER QUESTION ANSWER TEXT . 
1 appname 35316 11 Question Ctry of domicile . 
1 appname 35316 11 Question Name . 
1 appname 35316 11 Question Nationality . 
1 appname 35316 12 Question Corp name . 
1 appname 35316 13 Question No . 
1 appname 35316 14 Question firms Pos . 
+0

请[不要张贴图像的代码(或链接到他们)](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-所以当问一个问题) – jezrael

+0

道歉张贴图像,但没有其他方式,我可以从excel发布数据到这里没有适当的格式 – vivek

+0

嗯,如果复制粘贴并添加4个空格前,它不会每行工作? – jezrael

回答

1

我认为你需要boolean indexingapply创建面膜:

df1 = df[df.groupby(['APPID', 'UI SEQ NUMBER'])['ASSESSMENT ID'].apply(lambda x:x==x.max())] 
print (df1) 
    APPID APPNAME ASSESSMENT ID UI SEQ NUMBER QUESTION  ANSWER TEXT. 
3  1 appname   35316    11 Question Ctry of domicile. 
4  1 appname   35316    11 Question    Name. 
5  1 appname   35316    11 Question  Nationality. 
11  1 appname   35316    12 Question   Corp name. 
14  1 appname   35316    13 Question    No. 
17  1 appname   35316    14 Question   firms Pos. 

或者,如果不需要的所有重复值使用idxmax

df1 = df.loc[df.groupby(['APPID', 'UI SEQ NUMBER'])['ASSESSMENT ID'].idxmax()] 
print (df1) 
    APPID APPNAME ASSESSMENT ID UI SEQ NUMBER QUESTION  ANSWER TEXT. 
3  1 appname   35316    11 Question Ctry of domicile. 
11  1 appname   35316    12 Question   Corp name. 
14  1 appname   35316    13 Question    No. 
17  1 appname   35316    14 Question   firms Pos. 
+0

完美jezrael。那解决了它。我正在执行以下-df [df.groupby(['APPID','UI SEQ SEQUMBERS'])['ASSESSMENT ID']。max() – vivek

+0

那么最好使用'df1 = df.loc [df.groupby( ['APPID','UI SEQ ID NUMBER'])['ASSESSMENT ID']。idxmax()]' – jezrael

相关问题