2016-11-06 52 views
4

之间我见的熊猫数据帧在Python 2.7以下:熊猫经纬度为距离连续的行

Ser_Numb  LAT  LONG 
     1 74.166061 30.512811 
     2 72.249672 33.427724 
     3 67.499828 37.937264 
     4 84.253715 69.328767 
     5 72.104828 33.823462 
     6 63.989462 51.918173 
     7 80.209112 33.530778 
     8 68.954132 35.981256 
     9 83.378214 40.619652 
     10 68.778571 6.607066 

我期待计算数据帧的连续行之间的距离。输出应该是这个样子:

Ser_Numb   LAT  LONG Distance 
     1 74.166061 30.512811   0 
     2 72.249672 33.427724   d_between_Ser_Numb2 and Ser_Numb1 
     3 67.499828 37.937264   d_between_Ser_Numb3 and Ser_Numb2 
     4 84.253715 69.328767   d_between_Ser_Numb4 and Ser_Numb3 
     5 72.104828 33.823462   d_between_Ser_Numb5 and Ser_Numb4 
     6 63.989462 51.918173   d_between_Ser_Numb6 and Ser_Numb5 
     7 80.209112 33.530778 . 
     8 68.954132 35.981256 . 
     9 83.378214 40.619652 . 
     10 68.778571 6.607066 . 

尝试

This post看起来有点类似,但它是计算固定点之间的距离。我需要连续点之间的距离。

我努力去适应这个如下:

df['LAT_rad'], df['LON_rad'] = np.radians(df['LAT']), np.radians(df['LONG']) 
df['dLON'] = df['LON_rad'] - np.radians(df['LON_rad'].shift(1)) 
df['dLAT'] = df['LAT_rad'] - np.radians(df['LAT_rad'].shift(1)) 
df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2)) 

不过,我得到以下错误:

Traceback (most recent call last): 
    File "C:\Python27\test.py", line 115, in <module> 
    df['distance'] = 6367 * 2 * np.arcsin(np.sqrt(np.sin(df['dLAT']/2)**2 + math.cos(df['LAT_rad'].astype(float).shift(-1)) * np.cos(df['LAT_rad']) * np.sin(df['dLON']/2)**2)) 
    File "C:\Python27\lib\site-packages\pandas\core\series.py", line 78, in wrapper 
    "{0}".format(str(converter))) 
TypeError: cannot convert the series to <type 'float'> 
[Finished in 2.3s with exit code 1] 

此错误是从MaxU的评论固定。与修订,以此计算的输出不决策意识 - 距离近8000公里

Ser_Numb  LAT  LONG LAT_rad LON_rad  dLON  dLAT  distance 
0   1 74.166061 30.512811 1.294442 0.532549  NaN  NaN   NaN 
1   2 72.249672 33.427724 1.260995 0.583424 0.574129 1.238402 8010.487211 
2   3 67.499828 37.937264 1.178094 0.662130 0.651947 1.156086 7415.364469 
3   4 84.253715 69.328767 1.470505 1.210015 1.198459 1.449943 9357.184623 
4   5 72.104828 33.823462 1.258467 0.590331 0.569212 1.232802 7992.087820 
5   6 63.989462 51.918173 1.116827 0.906143 0.895840 1.094862 7169.812123 
6   7 80.209112 33.530778 1.399913 0.585222 0.569407 1.380421 8851.558260 
7   8 68.954132 35.981256 1.203477 0.627991 0.617777 1.179044 7559.609520 
8   9 83.378214 40.619652 1.455224 0.708947 0.697986 1.434220 9194.371978 
9  10 68.778571 6.607066 1.200413 0.115315 0.102942 1.175014   NaN 

据:

  • online calculator:如果我使用Latitude1 = 74.166061, Longitude1 = 30.512811,Latitude2 = 72.249672,Longitude2 = 33.427724 然后我得到233公里
  • 半正矢函数发现 here为:print haversine(30.512811, 74.166061, 33.427724, 72.249672)然后我 得到232.55公里

答案应该是233公里,但我的做法是给〜8000公里。我认为我试图在连续行之间进行迭代有些问题。

问题: 有没有办法在熊猫中做到这一点?或者,我是否需要一次遍历数据框一行?

附加信息:

创建上述DF,选择它,并复制到剪贴板。然后:

import pandas as pd 
df = pd.read_clipboard() 
print df 
+2

尝试替换'math.cos' - >'np。cos' – MaxU

回答

15

可以使用this great solution (c) @ballsatballsdotballs(不要忘记给予好评它;-)或this slightly optimized version

def haversine_np(lon1, lat1, lon2, lat2): 
    """ 
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees) 

    All args must be of equal length.  

    """ 
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2]) 

    dlon = lon2 - lon1 
    dlat = lat2 - lat1 

    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2 

    c = 2 * np.arcsin(np.sqrt(a)) 
    km = 6367 * c 
    return km 

df['dist'] = \ 
    haversine_np(df.LONG.shift(), df.LAT.shift(), 
       df.loc[1:, 'LONG'], df.loc[1:, 'LAT']) 

结果:

In [566]: df 
Out[566]: 
    Ser_Numb  LAT  LONG   dist 
0   1 74.166061 30.512811   NaN 
1   2 72.249672 33.427724 232.549785 
2   3 67.499828 37.937264 554.905446 
3   4 84.253715 69.328767 1981.896491 
4   5 72.104828 33.823462 1513.397997 
5   6 63.989462 51.918173 1164.481327 
6   7 80.209112 33.530778 1887.256899 
7   8 68.954132 35.981256 1252.531365 
8   9 83.378214 40.619652 1606.340727 
9  10 68.778571 6.607066 1793.921854 

UPDATE:这将有助于理解逻辑:

In [573]: pd.concat([df['LAT'].shift(), df.loc[1:, 'LAT']], axis=1, ignore_index=True) 
Out[573]: 
      0   1 
0  NaN  NaN 
1 74.166061 72.249672 
2 72.249672 67.499828 
3 67.499828 84.253715 
4 84.253715 72.104828 
5 72.104828 63.989462 
6 63.989462 80.209112 
7 80.209112 68.954132 
8 68.954132 83.378214 
9 83.378214 68.778571 
+0

我仍然得到TypeError的错误:无法将系列转换为 [在2.3s中完成并退出代码1]。另外,我似乎在逻辑上遇到了麻烦。 (A)你为什么没有任何参数使用'.shift()',(B)有没有理由从第二行开始使用'df.ix [1 :,'LONG']''为什么不使用'df .ix [:,'LONG']'并尝试用'shift(#)'来纠正这个问题? –

+1

@WR,'shift()'=='shift(1)'('1'是一个默认值)。检查更新 - 它会显示将传递给函数的参数对... – MaxU

+0

谢谢。好的,代码的作品,我没有得到'TypeError'了。此外,感谢您在答案中的**更新**。这有帮助。我遇到的问题是了解如何将移位后的值与原始值结合在一起。谢谢你的解释。 –

相关问题