2016-09-25 80 views
1

我有两个data.frames,一个代表我的系统中两个站点的定向连接,例如站点a1下面(看着行)连接到站点d1,但没有另一种方式。而d1连接到自身和d2。r - 在主/明细层次结构中合并两个数据帧

>connections=read.table("file1") 
>connections 
    V1 V2 V3 V4 V5 Site 
1 0 1 0 0 0 a1 
2 0 1 1 0 0 d1 
3 0 1 1 1 1 d2 
4 1 0 0 0 0 d3 
5 0 0 0 0 0 f1 

什么已经真的很难,我是写在每个站点的亚单位而言,这连接,每个站点由5个亚基组成,下面我总结文件中看到,每个子单元ID都是唯一的。

> subunits=read.table("file2") 
    > subunits 
     Site minID maxId 
    1 a1  0  4 
    2 d1  5  9 
    3 d2 10 14 
    4 d3 15 19 
    5 f1 20 24 

我的目标是建立由亚基连接的详细信息表,结果应该是这个样子

site subunit numconnections conectionids… 
a1 0 5 5 6 7 8 9 
a1 1 5 5 6 7 8 9 
a1 2 5 5 6 7 8 9 
a1 3 5 5 6 7 8 9 
a1 4 5 5 6 7 8 9 
d1 5 10 5 6 7 8 9 10 11 12 13 14 
d1 6 10 5 6 7 8 9 10 11 12 13 14 
d1 7 10 5 6 7 8 9 10 11 12 13 14 
d1 8 10 5 6 7 8 9 10 11 12 13 14 
d1 9 10 5 6 7 8 9 10 11 12 13 14 
d2 10 20 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
d2 11 20 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
d2 12 20 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
d2 13 20 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
d2 14 20 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
d3 15 5 0 1 2 3 4 
d3 16 5 0 1 2 3 4 
d3 17 5 0 1 2 3 4 
d3 18 5 0 1 2 3 4 
d3 19 5 0 1 2 3 4 
f1 20 0 
f1 21 0 
f1 22 0 
f1 23 0 
f1 24 0 

… 

这是file1和file2的

文件1

"V1" "V2" "V3" "V4" "V5" "Site" 
"1" 0 1 0 0 0 "a1" 
"2" 0 1 1 0 0 "d1" 
"3" 0 1 1 1 1 "d2" 
"4" 1 0 0 0 0 "d3" 
"5" 0 0 0 0 0 "f1" 

file2

"Site" "minID" "maxId" 
"1" "a1" 0 4 
"2" "d1" 5 9 
"3" "d2" 10 14 
"4" "d3" 15 19 
"5" "f1" 20 24 

回答

0

如果您可以考虑使用列表并从那里访问相关信息,这里有一个解决方案。

  1. 创建您的数据帧:

connections <- data.frame( c(0,0,0,1,0), c(1,1,1,0,0), c(0,1,1,0,0), c(0,0,1,0,0), c(0,0,1,0,0), c("a1", "d1", "d2", "d3", "f1")) 

# here I build up the df as in your example but then show how to 
# rearrange it a bit so it is easier to use later 

colnames(connections) <-c("V1", "V2", "V3", "V4", "V5", "Site") 
colnames(connections) <- c("a1", "d1", "d2", "d3", "f1") 
row.names(connections) <- c("a1", "d1", "d2", "d3", "f1") 
connections <- connections[,-6] 

subunits <- data.frame(c("a1", "d1", "d2", "d3", "f1"), c(0, 5, 10, 15, 20), c(4,9,14,19,24)) 
colnames(subunits) <- c("Site", "minID", "maxId") 
row.names(subunits) <- c("a1", "d1", "d2", "d3", "f1") 
subunits <- subunits[,-1] 
  • 创建什么连接到什么

  • site <- apply(X = connections, 1, FUN = function(x) which(x == 1)) 
    
    列表
    1. 使亚单元的列表 - 获取连接ID

    subunit <- t(apply(subunits, 1, function(x) seq(x[1], x[2]))) 
    
    时,这将是有用的
  • 获取列表numconnections与连接的数量

  • # here I switch to a loop rather than apply as I think it will be more readable 
    
    numconnections <- vector("list", length(names(site))) 
    names(numconnections) <- names(site) 
    for(i in 1:length(names(site))){ 
        nc <- sum(names(site[[i]]) %in% row.names(subunit) * length(site)) 
        numconnections[[i]] <- nc 
    
    } 
    
  • 最后,对于每个子单元得到所有的连接ID

  • conectionids <- vector("list", length(names(site))) 
    for(i in 1:length(names(site))){ 
    
        names(conectionids)[[i]] <- names(site)[i] 
        for(j in 1:length(names(site[[i]]))){ 
         conectionids[[i]] <- c(conectionids[[i]], subunit[which(row.names(subunit) %in% names(site[[i]])[j]), ]) 
        } 
    
    } 
    

    从这里可以查询相关信息,或者建立一个数据帧或列表(将所有的信息列表,如果可能最好)。