2016-10-03 92 views
1

使用USASPENDING.gov中公开可用的csv文件。能够从海军提取数据,但不知道正确的语法添加第二个筛选器以排除所有记录与Dollarsobligated = 0Python 3 Pandas按多个列值进行过滤/提取,包括<> 0

代码是:

import pandas as pd 

df = pd.read_csv("2016_DOD_Contracts_Full_20160915.csv") 
df.columns = [c.replace(' ','_') for c in df.columns] 
new_df = df[(df.mod_agency == '1700: DEPT OF THE NAVY') & (df.dollarsobligated <> 0)] 

# Export result to CSV 
new_df.to_csv('example15.csv') 

我得到那个说<>是无效的语法错误。没有网络上的'不等于0'的例子。

+1

在Python2'<>'是相当于'!='。 [在Python3中,'<>'被删除](https://docs.python.org/3.0/whatsnew/3.0.html#removed-syntax)。 – unutbu

+0

很高兴知道,谢谢unutbu :) –

回答

2

我想你需要更换<>!=boolean indexing,因为in Python3, <> was removed,谢谢unutbu

您也可以使用str.replace

df.columns = df.columns.str.replace(' ','_') 
new_df = df[(df.mod_agency == '1700: DEPT OF THE NAVY') & (df.Dollarsobligated != 0)] 

样本: “!=”

df = pd.DataFrame({'mod agency':['1700: DEPT OF THE NAVY', 
           '1700: DEPT OF THE NAVY', 
           '1800: DEPT OF THE NAVY'], 
        'Dollarsobligated':[1,0,0], 
        'C':[7,8,9]}) 

print (df) 
    C Dollarsobligated    mod agency 
0 7     1 1700: DEPT OF THE NAVY 
1 8     0 1700: DEPT OF THE NAVY 
2 9     0 1800: DEPT OF THE NAVY 

df.columns = df.columns.str.replace(' ','_') 
new_df = df[(df.mod_agency == '1700: DEPT OF THE NAVY') & (df.Dollarsobligated != 0)] 

print (new_df) 
    C Dollarsobligated    mod_agency 
0 7     1 1700: DEPT OF THE NAVY 
+0

检查出,谢谢jezrael :) –

+0

很高兴能帮助你! – jezrael

+0

如果我的回答很有帮助,请不要忘记[接受](http://meta.stackexchange.com/a/5235/295067)它。谢谢。 – jezrael

1

你必须使用的,而不是 “<>”

相关问题