2017-09-17 29 views
0

我正在寻找一种方法来以成对方式计算点之间的分隔距离并将每个单独点的结果存储在随附的嵌套数据框中。成对距离计算嵌套数据框

例如,我有这个数据框(来自地图包),其中包含有关我们城市的信息,包括他们的物理位置。我放弃了其余的信息,并将坐标嵌套在嵌套的数据框中。我打算使用geosphere包中的distHaversine()来计算这些距离。

library(tidyverse) 

df <- maps::us.cities %>% 
    slice(1:20) %>% 
    group_by(name) %>% 
    nest(long, lat, .key = coords) 

        name   coords 
        <chr>   <list> 
1   Abilene TX <tibble [1 x 2]> 
2    Akron OH <tibble [1 x 2]> 
3   Alameda CA <tibble [1 x 2]> 
4   Albany GA <tibble [1 x 2]> 
5   Albany NY <tibble [1 x 2]> 
...(With 15 more rows) 

我已经研究过使用地图族与mutate耦合的功能,但是我很困难。理想的结果是如下形式:

    name   coords   sep_dist 
        <chr>   <list>   <list> 
1   Abilene TX <tibble [1 x 2]> <tibble [19 x 2]> 
2    Akron OH <tibble [1 x 2]> <tibble [19 x 2]> 
3   Alameda CA <tibble [1 x 2]> <tibble [19 x 2]> 
4   Albany GA <tibble [1 x 2]> <tibble [19 x 2]> 
5   Albany NY <tibble [1 x 2]> <tibble [19 x 2]> 
...(With 15 more rows) 

随着sep_dist tibbles看起来像这样:

   location distance 
        <chr>  <dbl> 
1    Akron OH  1003 
2   Alameda CA  428 
3   Albany GA  3218 
4   Albany NY  3627 
5   Albany OR  97 
...(With 14 more rows)      -distances completely made up 

如果位置是被比较命名的点(在这种情况下阿比林) 。

回答

1

我们可以用位置名称和坐标的所有组合来扩展“网格”,但删除具有相同位置名称的组合。之后,使用map2_dbl来应用distHaversine功能。

library(tidyverse) 
library(geosphere) 

df2 <- df %>% 
    # Create the grid 
    mutate(name1 = name) %>% 
    select(starts_with("name")) %>% 
    complete(name, name1) %>% 
    filter(name != name1) %>% 
    left_join(df, by = "name") %>% 
    left_join(df, by = c("name1" = "name")) %>% 
    # Grid completed. Calcualte the distance by distHaversine 
    mutate(distance = map2_dbl(coords.x, coords.y, distHaversine)) 

df2 
# A tibble: 380 x 5 
     name   name1   coords.x   coords.y distance 
     <chr>   <chr>   <list>   <list>  <dbl> 
1 Abilene TX  Akron OH <tibble [1 x 2]> <tibble [1 x 2]> 1881904.4 
2 Abilene TX  Alameda CA <tibble [1 x 2]> <tibble [1 x 2]> 2128576.9 
3 Abilene TX  Albany GA <tibble [1 x 2]> <tibble [1 x 2]> 1470577.2 
4 Abilene TX  Albany NY <tibble [1 x 2]> <tibble [1 x 2]> 2542025.1 
5 Abilene TX  Albany OR <tibble [1 x 2]> <tibble [1 x 2]> 2429367.3 
6 Abilene TX Albuquerque NM <tibble [1 x 2]> <tibble [1 x 2]> 702287.5 
7 Abilene TX Alexandria LA <tibble [1 x 2]> <tibble [1 x 2]> 700093.2 
8 Abilene TX Alexandria VA <tibble [1 x 2]> <tibble [1 x 2]> 2161594.6 
9 Abilene TX Alhambra CA <tibble [1 x 2]> <tibble [1 x 2]> 1718967.5 
10 Abilene TX Aliso Viejo CA <tibble [1 x 2]> <tibble [1 x 2]> 1681868.8 
# ... with 370 more rows 

要创建最终的输出,我们可以group_by基于名称和nest所有其他所需的列。

df3 <- df2 %>% 
    select(-starts_with("coord")) %>% 
    group_by(name) %>% 
    nest() 

df3 
# A tibble: 20 x 2 
        name    data 
        <chr>   <list> 
1   Abilene TX <tibble [19 x 2]> 
2    Akron OH <tibble [19 x 2]> 
3   Alameda CA <tibble [19 x 2]> 
4   Albany GA <tibble [19 x 2]> 
5   Albany NY <tibble [19 x 2]> 
6   Albany OR <tibble [19 x 2]> 
7  Albuquerque NM <tibble [19 x 2]> 
8  Alexandria LA <tibble [19 x 2]> 
9  Alexandria VA <tibble [19 x 2]> 
10   Alhambra CA <tibble [19 x 2]> 
11  Aliso Viejo CA <tibble [19 x 2]> 
12    Allen TX <tibble [19 x 2]> 
13   Allentown PA <tibble [19 x 2]> 
14    Aloha OR <tibble [19 x 2]> 
15   Altadena CA <tibble [19 x 2]> 
16 Altamonte Springs FL <tibble [19 x 2]> 
17   Altoona PA <tibble [19 x 2]> 
18   Amarillo TX <tibble [19 x 2]> 
19    Ames IA <tibble [19 x 2]> 
20   Anaheim CA <tibble [19 x 2]> 

并且data中的每个数据帧现在看起来像这样。

df3$data[[1]] 
# A tibble: 19 x 2 
        name1 distance 
        <chr>  <dbl> 
1    Akron OH 1881904.4 
2   Alameda CA 2128576.9 
3   Albany GA 1470577.2 
4   Albany NY 2542025.1 
5   Albany OR 2429367.3 
6  Albuquerque NM 702287.5 
7  Alexandria LA 700093.2 
8  Alexandria VA 2161594.6 
9   Alhambra CA 1718967.5 
10  Aliso Viejo CA 1681868.8 
11    Allen TX 296560.4 
12   Allentown PA 2342363.5 
13    Aloha OR 2457938.8 
14   Altadena CA 1719207.6 
15 Altamonte Springs FL 1805480.9 
16   Altoona PA 2102993.0 
17   Amarillo TX 361520.0 
18    Ames IA 1194234.7 
19   Anaheim CA 1694698.9 
1

geosphere提供distm

重现的数据

set.seed(1) 
df <- data.frame(name=letters[1:4], 
       lon=runif(4)*10, 
       lat=runif(4)*10) 

distm

library(geosphere) 
ans <- as.data.frame(distm(df[,2:3], df[,2:3], fun=distHaversine)) 

     # a  b  c  d 
# 1  0.0 784506.1 894320.6 877440.5 
# 2 784506.1  0.0 226504.3 647666.7 
# 3 894320.6 226504.3  0.0 486290.8 
# 4 877440.5 647666.7 486290.8  0.0 

整洁比较所有到所有距离成所需的格式

的能力
colnames(ans) <- df$name 
library(dplyr) 
library(tidyr) 
desired <- ans %>% 
      gather(pos1, distance) %>% 
      mutate(pos2 = rep(df$name, nrow(df))) %>% 
      filter(pos1!=pos2) %>% 
      select(pos1, pos2, distance) 

    # pos1 pos2 distance 
# 1  a b 784506.1 
# 2  a c 894320.6 
# 3  a d 877440.5 
# 4  b a 784506.1 
# 5  b c 226504.3 
# 6  b d 647666.7 
# 7  c a 894320.6 
# 8  c b 226504.3 
# 9  c d 486290.8 
# 10 d a 877440.5 
# 11 d b 647666.7 
# 12 d c 486290.8 
+0

感谢您为我所描绘的方式提供了一个很好的选择。我接受了最初的答案,因为它更好地符合所述的结果和方法,但我很欣赏另一种做事方式。 – Jamesm131

+0

我同意你应该接受基于帖子的答案。很高兴有帮助。 – CPak