我有一个熊猫数据框，其中一列有一些缺失值。在熊猫中删除缺失值的行

数据帧由数百行组成，但在第4列中，其中五个值为?。

我想删除此列中值为?的行。

我一直在使用像

df = df[np.isfinite(df[:,4])]

来源

2016-09-24 Jamgreen

它们实际上是“？”（字符串）吗？如果它包含任何列，那么是否要删除该行？ –

'DataFrame.dropna（）'方法是否实现了你想要做的事情？ –

'df [df.iloc [：，4] .astype（str）！=“？”]'。也就是说，如果第4列意味着索引4.否则，您可能希望对第4列使用索引3. – Abdou

东西要取下其第4列等于?行试过，你可以选择不等于?数据。

# Test data 
df = DataFrame({ 
     'col0': [0, 1, 2, 3, 4], 
     'col1': [0, 1, 2, 3, 4], 
     'col2': [0, 1, 2, 3, 4], 
     'col3': [0, 1, 2, 3, 4], 
     'col4': [0, 1, 2, '?', '?']}) 

df.loc[df.iloc[:, 4] != '?'] 

    col0 col1 col2 col3 col4 
0  0  0  0  0 0 
1  1  1  1  1 1 
2  2  2  2  2 2

如果你想消除其第4列包含?行，这是一个有点棘手，因为你有逃脱?角色，并为布尔索引的默认值False工作，最后布尔否定~。

df.loc[~df.iloc[:,4].str.contains('\?', na = False)] 

    col0 col1 col2 col3 col4 
0  0  0  0  0 0 
1  1  1  1  1 1 
2  2  2  2  2 2

编辑

如果列只包含数字，您还可以使用下面的方法。使用errors参数coerce转换为数字，以便为无法转换的值生成NaN。然后简单地使用dropna删除这些值。

df.iloc[] = pd.to_numeric(df.iloc[:,4], errors='coerce') 
# Or if you want to apply the transformation to the entire DataFrame 
# df = df.apply(pd.to_numeric, errors='coerce')  
df.dropna(inplace=True) 

     col0 col1 col2 col3 col4 
0  0  0  0  0 0.0 
1  1  1  1  1 1.0 
2  2  2  2  2 2.0

来源

2016-09-24 21:16:00 Romain

此后，第4列的所有数字都不会作为字符串值使用，因为它在加载时具有字符串值？ – Jamgreen

@Jamgreen是的，我刚刚添加了一个编辑来使用这种方法。 – Romain

在熊猫中删除缺失值的行

回答

编辑

相关问题