2011-12-14 119 views
1

我有2组积分,set1set2。两组点都有与该点相关的数据。 set1中的点是“短暂的”,并且只在给定的日期存在。 set2中的点是“永久的”,在给定的日期构建,然后在该日期后永远存在。查找2组日期点之间的最近邻居

set.seed(1) 
dates <- seq(as.Date('2011-01-01'),as.Date('2011-12-31'),by='days') 

set1 <- data.frame(lat=40+runif(10000), 
lon=-70+runif(10000),date=sample(dates,10000,replace=TRUE)) 

set2 <- data.frame(lat=40+runif(100), 
lon=-70+runif(100),date=sample(dates,100,replace=TRUE)) 

这里是我的问题:对于集1(临时)的每一点发现在建成之前的事件是发生SET1 SET2(永久)到最近点的距离。例如,在设置1第1点发生在2011-03-18:

> set1[1,] 
     lat  lon  date 
1 40.26551 -69.93529 2011-03-18 

所以我想找到了2011-03-18以前建造的集2的最近点:

> head(set2[set2$date<=as.Date('2011-04-08'),]) 
     lat  lon  date 
1 40.41531 -69.25765 2011-02-18 
7 40.24690 -69.29812 2011-02-19 
13 40.10250 -69.52515 2011-02-12 
14 40.53675 -69.28134 2011-02-27 
17 40.66236 -69.07396 2011-02-17 
20 40.67351 -69.88217 2011-01-04 

额外的皱纹是这些是纬度/经度点,所以我必须计算沿着地球表面的距离。将R包fields提供convienent function做到这一点:

require(fields) 
distMatrix <- rdist.earth(set1[,c('lon','lat')], 
set2[,c('lon','lat')], miles = TRUE) 

我的问题是,如何可以调整在该矩阵中的距离,以Inf如果在SET2(距离矩阵的列)的点是在后点构成set1(距离矩阵行)?

回答

3

这里是我会做什么:

earlierMatrix <- outer(set1$date, set2$date, "<=") 
distMatrix2 <- distMatrix + ifelse(earlierMatrix, Inf, 0) 
+0

非常优雅。谢谢! 1个小错误:你把`ifelse`语句颠倒了。如果set1 $ date Zach 2011-12-14 18:29:08

0

这是我的答案。这不是特别有效,但我认为这是正确的。它还可以让你轻松地在不同的距离计算器子:

#Calculate distances 
require(fields) 
distMatrix <- lapply(1:nrow(set1),function(x) { 

    #Find distances to all points 
    distances <- rdist.earth(set1[x,c('lon','lat')], set2[,c('lon','lat')], miles = TRUE) 

    #Set distance to Inf if the set1 point occured BEFORE the set2 dates 
    distances <- ifelse(set1[x,'date']<set2[,'date'], Inf, distances) 

    return(distances) 
}) 
distMatrix <- do.call(rbind,distMatrix) 

#Find distance to closest object 
set1$dist <- apply(distMatrix,1,min) 

#Find id of closest object 
objectID <- lapply(1:nrow(set1),function(x) { 
    if (set1[x,'dist']<Inf) { 
     IDs <- which(set1[x,'dist']==distMatrix[x,]) 
    } else { 
     IDs <- NA 
    } 
    return(sample(IDs,1)) #Randomly break ties (if there are any) 
}) 
set1$objectID <- do.call(rbind,objectID) 

这里的结果数据集的头:

> head(set1) 
     lat  lon  date  dist objectID 
1 40.26551 -69.93529 2011-03-18 3.215514  13 
2 40.37212 -69.32339 2011-02-11 10.320910  46 
3 40.57285 -69.26463 2011-02-23 3.954132  4 
4 40.90821 -69.88870 2011-04-24 4.132536  49 
5 40.20168 -69.95335 2011-02-24 4.284692  45 
6 40.89839 -69.86909 2011-07-12 3.385769  57