2016-03-04 95 views
1

创建dataframe2基于列的组合,这是一个类似问题:cross join/merge to create dataframe of combinations (order doesn't matter)交叉连接/合并dataframe1在dataframe1

df = pd.DataFrame({'zone2': ['IL', 'IL-1', 'IL-3', 'IL'], 
        'city': ['Chicago', 'St.Louis', 'Monmouth', 'DesMoines'], 
        'zone1': ['Mid', 'Mid', 'Mid', 'Mid']}) 

我想创建列=城市的所有组合的第二个数据帧。

这是我这样做的方式,但必须有一种有效的方式来以更少的步骤完成此操作。

df2 = pd.DataFrame(list(itertools.combinations(list(df['city']), 2))) 
df2.columns = ['city_1', 'city_2'] 
df2 = df2.merge(df, left_on='city_1', right_on='city').merge(df, left_on='city_2', right_on='city', suffixes=('_x', '_y')) 
df2.drop(['city_x', 'city_y'], axis=1, inplace=True) 
>>> df2 

    city_1  city_2 zone1_x zone2_x zone1_y zone2_y 
0 Chicago St.Louis  Mid  IL  Mid IL-1 
1 Chicago Monmouth  Mid  IL  Mid IL-3 
2 St.Louis Monmouth  Mid IL-1  Mid IL-3 
3 Chicago DesMoines  Mid  IL  Mid  IL 
4 St.Louis DesMoines  Mid IL-1  Mid  IL 
5 Monmouth DesMoines  Mid IL-3  Mid  IL> 

回答

1
from itertools import combinations 

>>> pd.DataFrame(
     (pair[0] + pair[1] 
     for pair in (df.loc[df.city == a].values.tolist() + 
         df.loc[df.city == b].values.tolist() 
     for a, b in combinations(df.city.unique(), 2))), 
     columns=df.columns.tolist()+[c+"_2" for c in df]) 
     city zone1 zone2  city_2 zone1_2 zone2_2 
0 Chicago Mid IL St.Louis  Mid IL-1 
1 Chicago Mid IL Monmouth  Mid IL-3 
2 Chicago Mid IL DesMoines  Mid  IL 
3 St.Louis Mid IL-1 Monmouth  Mid IL-3 
4 St.Louis Mid IL-1 DesMoines  Mid  IL 
5 Monmouth Mid IL-3 DesMoines  Mid  IL 

您也可以尝试这样的变体:

pairs = ((a, b) for a, b in combinations(df.index, 2)) 

>>> pd.DataFrame({ 
     'city_1': df.ix[p[0], 'city'], 
     'city_2': df.ix[p[1], 'city'], 
     'zone1_1': df.ix[p[0], 'zone1'], 
     'zone1_2': df.ix[p[1], 'zone1'], 
     'zone2_1': df.ix[p[0], 'zone2'], 
     'zone2_2': df.ix[p[1], 'zone2']} for p in pairs) 

    city_1  city_2 zone1_1 zone1_2 zone2_1 zone2_2 
0 Chicago St.Louis  Mid  Mid  IL IL-1 
1 Chicago Monmouth  Mid  Mid  IL IL-3 
2 Chicago DesMoines  Mid  Mid  IL  IL 
3 St.Louis Monmouth  Mid  Mid IL-1 IL-3 
4 St.Louis DesMoines  Mid  Mid IL-1  IL 
5 Monmouth DesMoines  Mid  Mid IL-3  IL 
+0

感谢亚历山大。哇,这相当复杂。我很惊讶,没有内置的方法来实现我想要的框架。 – codingknob