如何将新列添加到值列表中的熊猫groupby对象

我想编写一个脚本，该脚本从一列中获取系列值，将它们拆分为字符串并为每个结果字符串创建一个新列（填充NaN现在）。由于DF是groupedby列1，我要为每个组如何将新列添加到值列表中的熊猫groupby对象

我的输入数据帧是这样做的：

df1: 
     Column1 Column2 
    0 L17  a,b,c,d,e 
    1 L7  a,b,c 
    2 L6  a,b,f 
    3 L6  h,d,e

我终于想拥有的是：

 Column1 Column2  a b c d e f h 
    0 L17  a,b,c,d,e nan nan nan nan nan nan nan 
    1 L7  a,b,c  nan nan nan nan nan nan nan 
    2 L6  a,b,f  nan nan nan nan nan nan nan

我代码目前看起来像这样：

def NewCols(x): 
    for item, frame in group['Column2'].iteritems(): 
     Genes = frame.split(',') 
     for value in Genes: 
      string = value 
      x[string] = np.nan 
      return x 

df1.groupby('Column1').apply(NewCols)

我的想法背后是代码循环th每个分组对象的粗略Column2，以逗号分隔frame中包含的值，并为该组创建一个列表。到目前为止，代码工作正常。然后我添加了

for value in Genes: 
    string = value 
    x[string] = np.nan 
    return x

打算为列表Genes中包含的每个值添加一个新列。但是，我的输出如下所示：

Column1 Column2 d 
0 L17  a,b,c,d,e nan 
1 L7  a,b,c  nan 
2 L6  a,b,f  nan 
3 L6  h,d,e  nan

而且我非常惊讶。有人可以解释为什么只有一列被追加（它甚至没有以第一组的第一个列表中的第一个值命名），并建议我如何改进我的代码？

来源

2015-10-15 sequence_hard

我想你只是return太早在你的函数中，在两个循环结束之前。如果你缩进它这样两次：

def NewCols(x): 
    for item, frame in group['Column2'].iteritems(): 
     Genes = frame.split(',') 
     for value in Genes: 
      string = value 
      x[string] = np.nan 
    return x 

UngroupedResGenesLineage.groupby('Column1').apply(NewCols)

它应该工作正常！

来源

2015-10-15 13:58:49 Mathiou

哦，男人......谢谢！：D –

不客气:) – Mathiou

cols = sorted(list(set(df1['Column2'].apply(lambda x: x.split(',')).sum()))) 
df = df1.groupby('Column1').agg(lambda x: ','.join(x)).reset_index() 
pd.concat([df,pd.DataFrame({c:np.nan for c in cols}, index=df.index)], axis=1) 

    Column1 Column2  a b c d e f h 
0 L17  a,b,c,d,e NaN NaN NaN NaN NaN NaN NaN 
1 L6  a,b,f,h,d,e NaN NaN NaN NaN NaN NaN NaN 
2 L7  a,b,c  NaN NaN NaN NaN NaN NaN NaN

来源

2015-10-15 16:42:42

如何将新列添加到值列表中的熊猫groupby对象

回答

相关问题