迭代通过多个dataframes大熊猫

我有两个dataframes： 1）包含供应商的名单和他们的纬度，经度坐标迭代通过多个dataframes大熊猫

sup_essential = pd.DataFrame({'supplier': ['A','B','C'], 
           'coords': [(51.1235,-0.3453),(52.1245,-0.3423),(53.1235,-1.4553)]})

2）存储列表和它们的纬度，经度坐标

stores_essential = pd.DataFrame({'storekey': [1,2,3], 
           'coords': [(54.1235,-0.6553),(49.1245,-1.3423),(50.1235,-1.8553)]})

我想创建一个输出表，其中包含store，store_coordinates，supplier，supplier_coordinates，每个store和supplier的组合距离。

我目前有：

test=[] 
for row in sup_essential.iterrows(): 
    for row in stores_essential.iterrows(): 
     r = sup_essential['supplier'],stores_essential['storeKey'] 
     test.append(r)

但这只是给了我所有重复值的

来源

2017-04-16 PaddyD15

请提供小（3-7行）在文本/ CSV格式再现的数据集和所希望的数据集。请阅读[如何使良好的可重复熊猫示例]（http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples） – MaxU

@MaxU数据本身是保密的，并给出它是坐标这将很容易识别。然而，标题都是：对于专卖店： storeKey（INT）\t locationLongitude \t locationLatitude \t COORDS（纬度，经度）对于供应商：供应商（VARCHAR）\t纬度\t经度\t COORDS（纬度，经度） – PaddyD15

您不需要指定真实数据。只需[post]（http://stackoverflow.com/posts/43435657/edit）示例（假）数据集在您的问题 – MaxU

来源的DF

In [105]: sup 
Out[105]: 
       coords supplier 
0 (51.1235, -0.3453)  A 
1 (52.1245, -0.3423)  B 
2 (53.1235, -1.4553)  C 

In [106]: stores 
Out[106]: 
       coords storekey 
0 (54.1235, -0.6553)   1 
1 (49.1245, -1.3423)   2 
2 (50.1235, -1.8553)   3

解决方案：

from sklearn.neighbors import DistanceMetric 
dist = DistanceMetric.get_metric('haversine') 

m = pd.merge(sup.assign(x=0), stores.assign(x=0), on='x', suffixes=['1','2']).drop('x',1) 

d1 = sup[['coords']].assign(lat=sup.coords.str[0], lon=sup.coords.str[1]).drop('coords',1) 
d2 = stores[['coords']].assign(lat=stores.coords.str[0], lon=stores.coords.str[1]).drop('coords',1) 

m['dist_km'] = np.ravel(dist.pairwise(np.radians(d1), np.radians(d2)) * 6367) 
## -- End pasted text --

结果：

In [135]: m 
Out[135]: 
       coords1 supplier    coords2 storekey  dist_km 
0 (51.1235, -0.3453)  A (54.1235, -0.6553)   1 334.029670 
1 (51.1235, -0.3453)  A (49.1245, -1.3423)   2 233.213416 
2 (51.1235, -0.3453)  A (50.1235, -1.8553)   3 153.880680 
3 (52.1245, -0.3423)  B (54.1235, -0.6553)   1 223.116901 
4 (52.1245, -0.3423)  B (49.1245, -1.3423)   2 340.738587 
5 (52.1245, -0.3423)  B (50.1235, -1.8553)   3 246.116984 
6 (53.1235, -1.4553)  C (54.1235, -0.6553)   1 122.997130 
7 (53.1235, -1.4553)  C (49.1245, -1.3423)   2 444.459052 
8 (53.1235, -1.4553)  C (50.1235, -1.8553)   3 334.514028

来源

2017-04-16 10:17:22 MaxU

迭代通过多个dataframes大熊猫

回答

相关问题