2017-04-16 136 views
1

我有两个dataframes: 1)包含供应商的名单和他们的纬度,经度坐标迭代通过多个dataframes大熊猫

sup_essential = pd.DataFrame({'supplier': ['A','B','C'], 
           'coords': [(51.1235,-0.3453),(52.1245,-0.3423),(53.1235,-1.4553)]}) 

2)存储列表和它们的纬度,经度坐标

stores_essential = pd.DataFrame({'storekey': [1,2,3], 
           'coords': [(54.1235,-0.6553),(49.1245,-1.3423),(50.1235,-1.8553)]}) 

我想创建一个输出表,其中包含store,store_coordinates,supplier,supplier_coordinates,每个store和supplier的组合距离。

我目前有:

test=[] 
for row in sup_essential.iterrows(): 
    for row in stores_essential.iterrows(): 
     r = sup_essential['supplier'],stores_essential['storeKey'] 
     test.append(r) 

但这只是给了我所有重复值的

+0

请提供小(3-7行)在文本/ CSV格式再现的数据集和所希望的数据集。请阅读[如何使良好的可重复熊猫示例](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – MaxU

+0

@MaxU数据本身是保密的,并给出它是坐标这将很容易识别。然而,标题都是: 对于专卖店: storeKey(INT)\t locationLongitude \t locationLatitude \t COORDS(纬度,经度) 对于供应商: 供应商(VARCHAR)\t纬度\t经度\t COORDS(纬度,经度) – PaddyD15

+0

您不需要指定真实数据。只需[post](http://stackoverflow.com/posts/43435657/edit)示例(假)数据集在您的问题 – MaxU

回答

0

来源的DF

In [105]: sup 
Out[105]: 
       coords supplier 
0 (51.1235, -0.3453)  A 
1 (52.1245, -0.3423)  B 
2 (53.1235, -1.4553)  C 

In [106]: stores 
Out[106]: 
       coords storekey 
0 (54.1235, -0.6553)   1 
1 (49.1245, -1.3423)   2 
2 (50.1235, -1.8553)   3 

解决方案:

from sklearn.neighbors import DistanceMetric 
dist = DistanceMetric.get_metric('haversine') 

m = pd.merge(sup.assign(x=0), stores.assign(x=0), on='x', suffixes=['1','2']).drop('x',1) 

d1 = sup[['coords']].assign(lat=sup.coords.str[0], lon=sup.coords.str[1]).drop('coords',1) 
d2 = stores[['coords']].assign(lat=stores.coords.str[0], lon=stores.coords.str[1]).drop('coords',1) 

m['dist_km'] = np.ravel(dist.pairwise(np.radians(d1), np.radians(d2)) * 6367) 
## -- End pasted text -- 

结果:

In [135]: m 
Out[135]: 
       coords1 supplier    coords2 storekey  dist_km 
0 (51.1235, -0.3453)  A (54.1235, -0.6553)   1 334.029670 
1 (51.1235, -0.3453)  A (49.1245, -1.3423)   2 233.213416 
2 (51.1235, -0.3453)  A (50.1235, -1.8553)   3 153.880680 
3 (52.1245, -0.3423)  B (54.1235, -0.6553)   1 223.116901 
4 (52.1245, -0.3423)  B (49.1245, -1.3423)   2 340.738587 
5 (52.1245, -0.3423)  B (50.1235, -1.8553)   3 246.116984 
6 (53.1235, -1.4553)  C (54.1235, -0.6553)   1 122.997130 
7 (53.1235, -1.4553)  C (49.1245, -1.3423)   2 444.459052 
8 (53.1235, -1.4553)  C (50.1235, -1.8553)   3 334.514028