2017-10-13 117 views
1

我有一个csv文件,我读为pd.read_csv(文件),我想只获得那些值大于零的行。过滤数据帧列值大于零?

数据框有一些空的单元格和一些负值以及一些exp数字,例如-1.72E + 10。

Time    A  B  C  D  E  F   G 
9/8/2017 8:40 1.29 0.27 1.78 0.23 0.33 0.05 -13.72 
9/8/2017 9:00 1.28 0.26 1.78 0.22 0.35 0.02 -13.59 
9/8/2017 9:20 1.43       
9/8/2017 9:40 1.44 0.29 1.93 0.25 0.28 0.01 -13.92 
9/8/2017 10:00 1.36 0.27 1.84 0.23 0.31 0.02 -13.77 
9/8/2017 10:20 1.38 0.27 1.89 0.23 0.31 0.01 -13.83 
9/8/2017 10:40  -1.72E+10 -1.72E+10 -1.72E+10 -1.72E+10 -1.72E+10 -1.72E+10 
9/8/2017 11:00 1.4 0.28 1.88 0.24 0.28 0.02 -13.92 
9/8/2017 11:20 1.43 0.28 1.92 0.24 0.29 0.02 -13.83 

每当我运行代码它不会过滤这些数据。

df = df[df > 0] 

列的类型是str的,而不是numpy.float64

有人能告诉我什么问题?

我要过滤整个数据框行其值大于0

回答

0

graeter我认为你需要any用于检查至少一个True

df = df[(df > 0).any(axis=1)] 

或者all进行检查,如果所有True小号:

df = df[(df > 0).all(axis=1)] 

#last row and first numeric column was modify for no negative values 
print (df) 
      Time    A    B    C    D \ 
0 9/8/2017 8:40 1.290000e+00 2.700000e-01 1.780000e+00 2.300000e-01 
1 9/8/2017 9:00 1.280000e+00 2.600000e-01 1.780000e+00 2.200000e-01 
2 9/8/2017 9:20 1.430000e+00   NaN   NaN   NaN 
3 9/8/2017 9:40 1.440000e+00 2.900000e-01 1.930000e+00 2.500000e-01 
4 9/8/2017 10:00 1.360000e+00 2.700000e-01 1.840000e+00 2.300000e-01 
5 9/8/2017 10:20 1.380000e+00 2.700000e-01 1.890000e+00 2.300000e-01 
6 9/8/2017 10:40 1.720000e+10 -1.720000e+10 -1.720000e+10 -1.720000e+10 
7 9/8/2017 11:00 1.400000e+00 2.800000e-01 1.880000e+00 2.400000e-01 
8 9/8/2017 11:20 1.430000e+00 2.800000e-01 1.920000e+00 2.400000e-01 

       E    F  G 
0 3.300000e-01 5.000000e-02 -13.72 
1 3.500000e-01 2.000000e-02 -13.59 
2   NaN   NaN NaN 
3 2.800000e-01 1.000000e-02 -13.92 
4 3.100000e-01 2.000000e-02 -13.77 
5 3.100000e-01 1.000000e-02 -13.83 
6 -1.720000e+10 -1.720000e+10 NaN 
7 2.800000e-01 2.000000e-02 -13.92 
8 2.900000e-01 2.000000e-02 13.83 


df1 = df[(df > 0).all(axis=1)] 
print (df1) 
      Time  A  B  C  D  E  F  G 
8 9/8/2017 11:20 1.43 0.28 1.92 0.24 0.29 0.02 13.83 

df1 = df.loc[:, (df > 0).all()] 
print (df1) 
      Time    A 
0 9/8/2017 8:40 1.290000e+00 
1 9/8/2017 9:00 1.280000e+00 
2 9/8/2017 9:20 1.430000e+00 
3 9/8/2017 9:40 1.440000e+00 
4 9/8/2017 10:00 1.360000e+00 
5 9/8/2017 10:20 1.380000e+00 
6 9/8/2017 10:40 1.720000e+10 
7 9/8/2017 11:00 1.400000e+00 
8 9/8/2017 11:20 1.430000e+00 

EDIT1:

对于皈依float一切都没有列Time

cols = df.columns.difference(['Time']) 
df[cols] = df[cols].astype(float) 
print (df.dtypes) 
Time  object 
A  float64 
B  float64 
C  float64 
D  float64 
E  float64 
F  float64 
G  float64 
dtype: object 

df1 = df.loc[:, (df > 0).all()] 
print (df1) 
      Time    A 
0 9/8/2017 8:40 1.290000e+00 
1 9/8/2017 9:00 1.280000e+00 
2 9/8/2017 9:20 1.430000e+00 
3 9/8/2017 9:40 1.440000e+00 
4 9/8/2017 10:00 1.360000e+00 
5 9/8/2017 10:20 1.380000e+00 
6 9/8/2017 10:40 1.720000e+10 
7 9/8/2017 11:00 1.400000e+00 
8 9/8/2017 11:20 1.430000e+00 
+0

但这并非过滤数据帧。我仍然得到负面的价值。 – Dheeraj

+0

我觉得'all'应该可以工作。 – jezrael

+0

我想单独过滤列 – Dheeraj