2017-04-06 70 views
0

我有一个数据帧有两个数字变量:latlong。这样在R中,使用dplyr函数来查找最小距离

> head(pontos_sub) 
    id  lat  long 
1 0 -22,91223 -43,18810 
2 1 -22,91219 -43,18804 
3 2 -22,91225 -43,18816 
4 3 -22,89973 -43,20855 
5 4 -22,89970 -43,20860 
6 5 -22,89980 -43,20860 

现在什么东西,我会做一个整数:

pontos_sub$long_r <- round(pontos_sub$long, 3) 
pontos_sub$lat_r <- round(pontos_sub$lat, 3) 

> head(pontos_sub) 
    id  lat  long long_r lat_r 
1 0 -22,91223 -43,18810 -43,188 -22,912 
2 1 -22,91219 -43,18804 -43,188 -22,912 
3 2 -22,91225 -43,18816 -43,188 -22,912 
4 3 -22,89973 -43,20855 -43,209 -22,900 
5 4 -22,89970 -43,20860 -43,209 -22,900 
6 5 -22,89980 -43,20860 -43,209 -22,900 

现在,我想用dplyr发现,每个独特long_r lat_r组,并且用distVincentyEllipsoid功能,与相应组的所有纬度长度的最小距离。

> newdata <- pontos_sub %>% 
       group_by(long_r,lat_r) %>% 
       summarise(min_long = special_fun(arg), 
         min_lat = special_fun(arg)) 

得到的是这样的:

> head(newdata) 
    long_r lat_r min_long min_lat 
1 -43,188 -22,912 xxxxxx xxxxxxx 
4 -43,209 -22,900 xxxxxx xxxxxxx 

最后,这是快速的方式吗?还是有其他方式更快?牛逼

回答

1

你可以这样来做:

pontos_sub %>% 
    mutate(dist = distVincentyEllipsoid(cbind(long, lat), cbind(long_r, lat_r))) %>% 
    group_by(long_r, lat_r) %>% 
    arrange(dist) %>% 
    slice(1) %>% 
    rename(min_long = long, min_lat = lat) %>% 
    select(long_r, lat_r, min_long, min_lat) 

# Source: local data frame [2 x 4] 
# Groups: long_r, lat_r [2] 
# 
# long_r lat_r min_long min_lat 
#  <dbl> <dbl>  <dbl>  <dbl> 
# 1 -43.209 -22.900 -43.20860 -22.89980 
# 2 -43.188 -22.912 -43.18804 -22.91219 

数据:

pontos_sub <- read.table(text=" 
    id  lat  long 
1 0 -22,91223 -43,18810 
2 1 -22,91219 -43,18804 
3 2 -22,91225 -43,18816 
4 3 -22,89973 -43,20855 
5 4 -22,89970 -43,20860 
6 5 -22,89980 -43,20860     
       ", dec = ",") 

pontos_sub$long_r <- round(pontos_sub$long, 3) 
pontos_sub$lat_r <- round(pontos_sub$lat, 3)