2016-11-06 81 views
3

我有两个xy坐标数据集。第一个有xy坐标加上一个带有我的因子水平的标签列。我叫data.frameqq,它看起来像这样:按R中的因子计算多边形内的点数

structure(list(x = c(5109, 5128, 5137, 5185, 5258, 5324, 5387, 
5343, 5331, 5347, 5300, 5180, 4109, 4082, 4091, 4139, 4212, 4279, 
4291, 4297, 4285, 4301, 4254, 4181), y = c(1692, 1881, 2070, 
2119, 2144, 2065, 1987, 1813, 1705, 1649, 1631, 1654, 1847, 2015, 
2204, 2253, 2278, 2282, 2166, 1947, 1839, 1783, 1765, 1783), 
    tag = c("MPN_right", "MPN_right", "MPN_right", "MPN_right", 
    "MPN_right", "MPN_right", "MPN_right", "MPN_right", "MPN_right", 
    "MPN_right", "MPN_right", "MPN_right", "MPN_left", "MPN_left", 
    "MPN_left", "MPN_left", "MPN_left", "MPN_left", "MPN_left", 
    "MPN_left", "MPN_left", "MPN_left", "MPN_left", "MPN_left" 
    )), .Names = c("x", "y", "tag"), row.names = c(NA, -24L), class = "data.frame") 

我生成的随机数据使用qqxy用大sd意味着另一个。

set.seed(123) 
my_points=data.frame(x=rnorm(n =1000,mean=mean(qq$x),sd=1000), 
y=rnorm(n=1000,mean=mean(qq$y),sd=1000)) 

如果我使用in.out功能从mgcv包,我得到有些我想要的东西。

这种方法的主要问题是我的'多边形'未关闭,也不会被解释为2个多边形。该包建议在两者之间使用一个NA行,但我宁愿使用我的标记列,因为我将尝试在我的标记因子中使用多于两个级别,即超过2个多边形)。我的最终目标是创建一个每个点内的点数的表格。

+0

我建议将您的数据帧转换为适当的*空间对象。然后,您将可以轻松地使用专门的空间操作。 – lbusett

+0

我不确定你真正想要什么,但是我在库(recexcavAAR)中实现了一个简单的“Point-in-Polygon”函数:'' )' – nevrome

+0

这与in.out的工作方式相同,我如何以平面方式制作多边形? –

回答

1

lapply()sapply()帮助您使用功能par级别。

## a bit edited to make output clear 

library(dplyr); library(mgcv) 

TAG <- unique(qq$tag) 

IN.OUT <- lapply(TAG, function(x) as.matrix(qq[qq$tag==x, 1:2])) %>% # make a matrix par level 
    sapply(function(x) in.out(x, as.matrix(my_points)))  # use in.out() with each matrix 

colnames(IN.OUT) <- TAG 

head(IN.OUT, n = 3) 

#  MPN_right MPN_left 
# [1,]  FALSE FALSE 
# [2,]  FALSE FALSE 
# [3,]  FALSE FALSE 

apply(IN.OUT, 2, table) 

#  MPN_right MPN_left 
# FALSE  983  990 
# TRUE   17  10 
+0

我终于用了类似的东西,有一些帮助函数和更复杂的东西,但是谢谢你的代码,它和我写的非常相似 –

2

你看这个:

mysppoint <- SpatialPoints(coords = my_points) # create spatial points 
qq$tag <- as.factor(qq$tag) 
polys = list() 

# create one polygon for each factor level 
for (lev in levels(qq$tag)){ 
    first_x <- qq$x[qq$tag == lev][1] 
    first_y <- qq$y[qq$tag == lev][1] 
    qq <- rbind(qq, data.frame(x = first_x, y = first_y, tag = lev)) # "close" the polygon by replicating the first row 
    polys[[lev]] <- Polygons(list(Polygon(matrix(data = cbind(qq$x[qq$tag == lev], # transform to polygon 
                  qq$y[qq$tag == lev]), 
               ncol = 2))), lev) 
} 

mypolys <- SpatialPolygons(polys) # convert to spatial polygons 
inters <- factor(over(mysppoint, mypolys), labels = names(mypolys)) # intersect points with polygons 
table(inters) 

,这给:这个

inters 
MPN_left MPN_right 
     10  17 

优点是,它为您提供了适当的空间物体的工作。例如:

plotd <- fortify(mypolys) 
p <- ggplot() 
p <- p + geom_point(data = my_points, aes(x = x , y = y), size = 0.2) 
p <- p + geom_polygon(data = plotd, aes(x = long, y = lat, fill = id), alpha = 0.7) 
p 

Plot of polygons and points

+0

奇怪的是,这不会产生10和17.另外,我不想去'第一','第二',因为我可能有10个或更多的地区。 –

+0

我可能使用了不同的种子。另外,在结果中获得关卡的名字很简单:稍后我会更改答案。 – lbusett

+0

@ matias-andina:现在修好。只是事先转换因素以获得正确的“名称”。也使用正确的种子... – lbusett

1

我结束了使用lapplysplitlapply的组合。所以这里是代码,请忽略extract_coords帮手功能,它基本上给我一个dataframex,y和标签列。我还设法将原始your_coords中的点数进行子集并对它们进行计数(将它们作为矢量而不是表格返回)。

inside_ROI = function(your_ROI_zip,your_coords){ 

    # Helper function will take list from zip ROIs and merge them into a df 
    qq=extract_coords(your_ROI_zip) 

    # We use tag for splitting by the region 

    lista=split(qq,qq$tag) 

    # We check if they are in or out 
    who_is_in = lapply(lista,function(t) in.out(cbind(t$x,t$y),x=cbind(your_coords$x,your_coords$y))) 

    # We sum to get the by area countings 
    region_sums = unlist(lapply(who_is_in,function(t) sum(as.numeric(t)))) 

    # obtain indices for subset of TRUE values 
    aa=lapply(who_is_in,function(p) which(p==T)) 

    whos_coords=list() 

for (i in aa){ 
    whos_coords = append(whos_coords,values=list(your_coords[i,])) 
    # whos_coords[[i]] = your_coords[i,] 
    } 

    # Change names 
    names(whos_coords) = names(aa) 

    # Put into list for more than one output 
    out=list(region_sums,aa,whos_coords) 

    return(out) 
}