生成从熊猫数据框的边缘名单

假设我有一个熊猫数据帧是这样的：生成从熊猫数据框的边缘名单

Fruit_1 Fruit_2 Fruit_3 
0 Apple  Orange Peach 
1 Apple  Lemon Lime 
2 Starfruit Apple Orange

重现的形式：

df = pd.DataFrame([['Apple', 'Orange', 'Peach'], 
        ['Apple', 'Lemon', 'Lime'], 
        ['Starfruit', 'Apple', 'Orange']], 
        columns=['Fruit_1', 'Fruit_2', 'Fruit_3'])

我想生成边缘名单，其中包括：

Apple, Orange 
Apple, Peach 
Orange, Peach 
Apple, Lemon 
Apple, Lime 
Lemon, Lime 
Starfruit, Apple 
Starfruit, Orange 
Apple, Orange

如何在Python中执行此操作？

来源

2017-02-25 Katie Truong

我不知道大熊猫，但你可以对行

itertools.combinations(row, 2)

这将创建一个迭代器，你可以简单地转换到对列表使用itertools.combinations。

搜集到一个列表之后加入这些列表可以使用一个平面列表理解

[pair for row in collected_rows for pair in row]

完成，或者如果你想有一个平坦的列表，用通常快得多numpy方式

data[:, np.c_[np.tril_indices(data.shape[1], -1)]]

data[:, np.c_[np.triu_indices(data.shape[1], 1)]].reshape(-1,2)

请注意，triu_indices列出了v ertices按顺序，而tril_indices反过来列出他们。它们通常用于获取矩阵的上三角或下三角的索引。

来源

2017-02-25 10:22:57

很好的解决方案！ – MaxU

这是有效的！谢谢！ –

这里是大熊猫的解决方案：

In [118]: from itertools import combinations 

In [119]: df.apply(lambda x: list(combinations(x, 2)), 1).stack().reset_index(level=[0,1], drop=True).apply(', '.join) 
Out[119]: 
0  Apple, Orange 
1   Apple, Peach 
2  Orange, Peach 
3   Apple, Lemon 
4   Apple, Lime 
5   Lemon, Lime 
6  Starfruit, Apple 
7 Starfruit, Orange 
8  Apple, Orange 
dtype: object

来源

2017-02-25 10:58:40 MaxU

完美的作品！谢谢！ –

@KatieTruong，很高兴我能帮上忙。请考虑[接受]（http://meta.stackexchange.com/a/5235）最有用的答案 - 这也表明您的问题已被回答 – MaxU

生成从熊猫数据框的边缘名单

回答

相关问题