用列表中的值替换pandas.DataFrame的NaN值

在使用库pandas的python脚本中，我有一个数据集，例如100行带有“X”的特征，包含36个NaN值，还有一个大小为36的列表用列表中的值替换pandas.DataFrame的NaN值

我想用我列表中的36个值替换列“X”的所有36个缺失值。

这可能是一个愚蠢的问题，但我经历了所有的文档，并找不到一种方法来做到这一点。

下面是一个例子：

INPUT

Data: X  Y 
     1  8 
     2  3 
     NaN 2 
     NaN 7 
     1  2 
     NaN 2

填料

List: [8, 6, 3]

输出

来源

2017-02-10 Mean-Street

你能提供输入和预期输出 – Shijo

当然，我编辑我的帖子来添加它。 –

同一列中的所有'NaN'值都是？你如何用你的列表替换'NaN'值？你是否顺序执行该操作，即用列表中的第一个值替换第一个“NaN”值，依此类推？ –

开始使用dataframe df

print(df) 

    X Y 
0 1.0 8 
1 2.0 3 
2 NaN 2 
3 NaN 7 
4 1.0 2 
5 NaN 2

定义要填充值（注：必须有相同数量的元素在你的filler列表，在您的数据帧NaN值）

filler = [8, 6, 3]

您可以筛选列（包含NaN值）和你filler

~~df.X[df.X.isnull()] = filler~~

df.loc[df.X.isnull(), 'X'] = filler

覆盖选定行这给出：

print(df) 

    X Y 
0 1.0 8 
1 2.0 3 
2 8.0 2 
3 6.0 7 
4 1.0 2 
5 3.0 2

来源

2017-02-10 20:17:21 bunji

它工作得很好，谢谢，但我有一个警告“SettingWithCopyWarning：值试图设置副本从DataFrame切片“。这很奇怪，因为我看到它确实修改了'df' ... –

根据[文档]（http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy ）在警告中列出，您需要将'df.X [df.X.isnull（）]'更改为'df.loc [df.X.isnull（），'X']' –

@MadPhysicist是正确的，如果你想要避免这个警告。就我个人而言，我倾向于使用原始语法，因为它看起来更直观（对我来说），我只是忽略了警告，因为它确实是我想要的。但是如果'.loc'方法对你来说看起来不错，那么你应该使用那个。 – bunji

这可能不是有效的，但仍然有效:) 首先找到Nan的所有索引并将它们替换为循环。假设列表总是大于楠的数量

import pandas as pd 
import numpy as np 

df = pd.DataFrame({'A': [np.nan, 1, 2], 'B': [10, np.nan, np.nan], 'C': [[20, 21, 22], [23, 24, 25], np.nan]}) 
lst=[12,35,78] 

index = df['B'].index[df['B'].apply(np.isnan)] #find Index 
cnt=0 
for item in index: 
    df.set_value(item, 'B', lst[item]) #replace Nan of the nth index with value from Nth value from list 
    cnt=cnt+1 

print df 

    A  B    C 
0 NaN 10.0 [20, 21, 22] 
1 1.0 NaN [23, 24, 25] 
2 2.0 NaN   NaN

输出。

 A  B    C 
0 NaN 10.0 [20, 21, 22] 
1 1.0 35.0 [23, 24, 25] 
2 2.0 78.0   NaN

来源

2017-02-10 20:08:36 Shijo

这里它会替换第一行的10个，我不想这样做：我只是想改变NaN值。 –

它不会，只能取代南的 – Shijo

好吧，如果索引对应于缺失的行，你说得对sry –

你不得不使用一个迭代器作为索引标记，以便在您的自定义列表与值替换您的NaN的：

import numpy as np 
import pandas as pd 

your_df = pd.DataFrame({'your_column': [0,1,2,np.nan,4,6,np.nan,np.nan,7,8,np.nan,9]}) # a df with 4 NaN's 
print your_df 

your_custom_list = [1,3,6,8] # custom list with 4 fillers 

your_column_vals = your_df['your_column'].values 

i_custom = 0 # starting index on your iterator for your custom list 
for i in range(len(your_column_vals)): 
    if np.isnan(your_column_vals[i]): 
     your_column_vals[i] = your_custom_list[i_custom] 
     i_custom += 1 # increase the index 

your_df['your_column'] = your_column_vals 

print your_df

输出：

your_column 
0   0.0 
1   1.0 
2   2.0 
3   NaN 
4   4.0 
5   6.0 
6   NaN 
7   NaN 
8   7.0 
9   8.0 
10   NaN 
11   9.0 
    your_column 
0   0.0 
1   1.0 
2   2.0 
3   1.0 
4   4.0 
5   6.0 
6   3.0 
7   6.0 
8   7.0 
9   8.0 
10   8.0 
11   9.0

来源

2017-02-10 20:12:02

用列表中的值替换pandas.DataFrame的NaN值

回答

相关问题