熊猫循环列的值

我有一个熊猫DataFrame，并且我想要使用的列的值是列表。我想将每个列表中的两个元素逐个组合起来，并输出到另一个DataFrame中。
例如，我有数据帧df，其中包含col_a和col_b。 col_b的值是列表。我想循环df.col_b的值，输出配对列表。熊猫循环列的值

import pandas as pd 

df=pd.DataFrame({'col_a':['ast1','ast2','ast3'],'col_b':[['text1','text2','text3'],['mext1','mext2','mext3'],['cext1','cext2']]}) 
df 

    col_a col_b 
0 ast1 [text1, text2, text3] 
1 ast2 [mext1, mext2, mext3] 
2 ast3 [cext1, cext2]

我想这一点：

col_a col_b_1 
0 ast1 [text1, text2] 
1 ast1 [text1, text3] 
2 ast1 [text2, text3] 
3 ast2 [mext1, mext2] 
4 ast2 [mext1, mext3] 
5 ast2 [mext2, mext3] 
6 ast3 [cext1, cext2]

来源

2016-12-02 running man

假设你col_a都有每行的独特的价值，你可以使用从itertoolscombinations生成列表元素的所有两个组合：

from itertools import combinations 
(df.groupby('col_a')['col_b'] 
    .apply(lambda x: pd.Series(list(combinations(x.iloc[0], 2)))) 
    .reset_index(level = 0)) 

# col_a   col_b 
#0 ast1 (text1, text2) 
#1 ast1 (text1, text3) 
#2 ast1 (text2, text3) 
#0 ast2 (mext1, mext2) 
#1 ast2 (mext1, mext3) 
#2 ast2 (mext2, mext3) 
#0 ast3 (cext1, cext2)

来源

2016-12-02 03:14:28 Psidom

您可以使用itertools来压扁列表：

import itertools 
series = df["col_b"].apply(lambda x: \ 
    pd.Series(list(itertools.combinations(x,2)))).stack()

该系列必须有一个名字是可合并的“母亲”数据框：

series.name = "col_b_1"

现在，合并这两个数据对象，并选择您想要的列：

result = df.merge(pd.DataFrame(series).reset_index(), 
    left_index=True, 
    right_on="level_0")[["col_a","col_b_1"]]

结果是一列元组;如果这不好，.apply()功能list()它。

# col_a   col_b_1 
# 0 ast1 (text1, text2) 
# 1 ast1 (text1, text3) 
# 2 ast1 (text2, text3) 
# 3 ast2 (mext1, mext2) 
# 4 ast2 (mext1, mext3) 
# 5 ast2 (mext2, mext3) 
# 6 ast3 (cext1, cext2)

来源

2016-12-02 03:26:40 DyZ

熊猫循环列的值

回答

相关问题