2016-07-14 84 views
0

我试图将人口普查的FIPS代码,县级唯一标识符“邻接列表”转换为实际邻接列表或边缘列表,然后最终转换为邻接矩阵。以下是人口普查FIPS代码数据:http://www2.census.gov/geo/docs/reference/county_adjacency.txt如何将R(杂乱)列表转换为R中的多个邻接列表或边界列表?

问题:如何将一个难缠的列表转换为多个逻辑邻接表,然后最终是一个矩阵?

问题在于,它不是任何常规理解短语时的“邻接表”。我对R非常陌生,请原谅任何错误或缺乏最佳做法...

我的直觉告诉我,通过列表进行循环,将数据分为唯一的邻接列表,将每个列表转换为矩阵,然后将矩阵绑定成一个大的二进制矩阵。我在网上搜索如何做到这一点,但所有的例子包含非常简单,清洁的数据。 :(

人口普查显示这样的FIPS码:

"Bullock County, AL" 01011 "Barbour County, AL" 01005 
     "Bullock County, AL" 01011 
     "Macon County, AL" 01087 
     "Montgomery County, AL" 01101 
     "Pike County, AL" 01109 
     "Russell County, AL" 01113 
"Butler County, AL" 01013 "Butler County, AL" 01013 
     "Conecuh County, AL" 01035 
     "Covington County, AL" 01039 
     "Crenshaw County, AL" 01041 
     "Lowndes County, AL" 01085 
     "Monroe County, AL" 01099 
     "Wilcox County, AL" 01131 

当我读链接成R的文本文件数据被显示这样的:

[1] "\"Autauga County, AL\"\t01001\t\"Autauga County, AL\"\t01001" "\t\t\"Chilton County, AL\"\t01021"       "\t\t\"Dallas County, AL\"\t01047"        
[4] "\t\t\"Elmore County, AL\"\t01051"        "\t\t\"Lowndes County, AL\"\t01085"       "\t\t\"Montgomery County, AL\"\t01101"       
[7] "\"Baldwin County, AL\"\t01003\t\"Baldwin County, AL\"\t01003" "\t\t\"Clarke County, AL\"\t01025"        "\t\t\"Escambia County, AL\"\t01053"       
[10] "\t\t\"Mobile County, AL\"\t01097" 

我用stringr包的正则表达式现在数据如下:

> str(cleaner) 
List of 100 
$ : chr [1:2] "01001" "01001" 
$ : chr "01021" 
$ : chr "01047" 
$ : chr "01051" 
$ : chr "01085" 
$ : chr "01101" 
$ : chr [1:2] "01003" "01003" 
$ : chr "01025" 
$ : chr "01053" 
$ : chr "01097" 
$ : chr "01099" 
$ : chr "01129" 
$ : chr "12033" 

我可以分组跟在邻接列表的“第一个”项目之后的元素,如下所示:

#function that groups FIPS codes, displays them by index value 
reduce_fips <- function(locations, vect) { 
    out <- list() 
    for (i in 1:length(locations)) { 
    if (i == length(locations)) { 
     out[[i]] <- locations[i]:length(vect) 
    } else { 
     out[[i]] <- locations[i]:(locations[i + 1] - 1) 
    } 
    } 
    out 
} 

out <- reduce_fips(adj_list_start, fips_codes) #produces adj list values 
#problem: some adj list start points contain 2 different values of fips codes 

fips_adj_df <- data.frame(cleaner = sapply(out, function(x) x[1])) 
fips_adj_df 
fips_adj_df$adjacent <- out 
#problem: how to transform this into a matrix or connected nodes 

这会产生如下所示的输出。然而,它在逻辑上不正确,并且通过记忆方式进行搜索会很昂贵。

cleaner       adjacent 
1  1     1, 2, 3, 4, 5, 6 
2  7   7, 8, 9, 10, 11, 12, 13 
3  14 14, 15, 16, 17, 18, 19, 20, 21, 22 
4  23   23, 24, 25, 26, 27, 28, 29 
5  30   30, 31, 32, 33, 34, 35, 36 
6  37    37, 38, 39, 40, 41, 42 
7  43   43, 44, 45, 46, 47, 48, 49 
8  50    50, 51, 52, 53, 54, 55 
9  56    56, 57, 58, 59, 60, 61 
10  62  62, 63, 64, 65, 66, 67, 68, 69 

最终,我想要一个这样的二进制矩阵,显示FIPS代码是否在地理上彼此相邻。例如,假设100,101和102彼此相邻,而103仅与102相邻,我希望矩阵显示这样的信息。

   FIPS 
FIPS  100 101 102 103 
    102  1 1 1 1 
    101  1 1 1 0 
    100  1 1 1 0 

回答

0

你在这个问题上有很多事情要做,所以我会尽力把它分解。

首先,您可以使用read.csv从文本文件获取信息。

df <- read.csv("county_adjacency.txt", sep="\t", stringsAsFactors = FALSE, header = FALSE) 

    # Drop the names for the counties, you don't need them  
    df <- df[,c("V2","V4")] 

使用动物园图书馆的na.locf填充na值。

library(zoo) 
    df$V2 <- na.locf(df$V2) 

列出你的fips。用它来制作你的矩阵。

fips <-unique(df$V2) 

    fips.matrix <- matrix(data=0, nrow = length(fips), ncol = length(fips), dimnames = list(fips,fips)) 

根据txt文件中的坐标向1填充矩阵。

df <- as.character(df) 

    fips.matrix[as.matrix(df)] <-1