2017-11-11 182 views
2

我想在R中创建一个函数“f”,它在条目中包含个人和个人之间的边缘data.frame(例如称为A2),并返回另一个只有A2的“祖先”和“孩子”以及祖先和孩子的祖先的数据框架!R中的函数返回网络中的祖先和孩子

为说明我的复杂的问题:

library(visNetwork) 
nodes <- data.frame(id = c(paste0("A",1:5),paste0("B",1:3)), 
       label = c(paste0("A",1:5),paste0("B",1:3))) 
edges <- data.frame(from = c("A1","A1","A2","A3","A4","B1","B2"), 
       to = c("A2","A3","A4","A4","A5","B3","B3")) 
visNetwork(nodes, edges) %>% 
    visNodes(font = list(size=45)) %>% 
    visHierarchicalLayout(direction = "LR", levelSeparation = 500) 

enter image description here

在这个例子中,包含data.frame 2个不同独立的网络:1个网络与 “A” S和其他与 “B” S 。

我想实现函数f(数据=边缘,逐张=“A2”),它返回其包含涉及的“A” S网络data.frame边缘的所有行一个data.frame:

F(边,“A2”)将返回该提取物data.frame边缘

head(f(edges,"A2")) 
# from to 
#1 A1 A2 
#2 A1 A3 
#3 A2 A4 
#4 A3 A4 
#5 A4 A5 

我希望这是很清楚的,你来帮我。

非常感谢!

+0

你试过了什么?你试图实现的算法是什么? –

+0

不确定要确切地理解你想要的东西,但其目标实际上是为每个个体返回它的祖先和子女以及他们的子女和祖先的祖先的子女。在花时间(当然是数小时)编写代码之前,我想知道是否有一个众所周知的函数/程序包来做这件事,因为在我看来,它可能是一个非常基本的问题(不像我),他们习惯于与网络一起工作。但是我没有在互联网上找到满意的东西(仅适用于树木),所以我想问更多的专业人士!谢谢 – antuki

+0

我不是图形分析师,但也许这可能有所帮助:http://igraph.org/r/doc/components.html – romles

回答

1

我写了一个简单的算法来查找所有链接到个人的家庭(我相信它可以改进)。像@romles建议你可以用像igraph这样的一些R包来做同样的事情。然而,在这种情况下,我的函数看起来更像igraph选项。

edges <- data.frame(from = c("A1","A1","A2","A3","A4","B1","B2"), 
        to = c("A2","A3","A4","A4","A5","B3","B3"), 
        stringsAsFactors = FALSE) 
f <- function(data, indiv){ 
    children_ancestors <- function(indiv){ 
     # Find children and ancestors of an indiv 
     c(data[data[,"from"]==indiv,"to"],data[data[,"to"]==indiv,"from"]) 
    } 
    family <- indiv 
    new_people <- children_ancestors(indiv) # New people to inspect 
    while(length(diff_new_p <- setdiff(new_people,family)) > 0){ 
     # if the new people aren't yet in the family : 
     family <- c(family, diff_new_p) 
     new_people <- unlist(sapply(diff_new_p, children_ancestors)) 
     new_people <- unique(new_people) 
    } 
    data[(data[,1] %in% family) | (data[,2] %in% family),] 
} 

f(edges, "A2")给出了预期的结果。与igraph相比:

library(igraph) 
library(microbenchmark) 
edges2 <- graph_from_data_frame(edges, directed = FALSE) 
microbenchmark(simple_function = f(edges,"A2"), 
       igraph_option = as_data_frame(subgraph.edges(edges2, subcomponent(edges2, 'A2', 'in'))) 
       ) 
#Unit: microseconds 
#   expr  min  lq  mean median  uq  max neval 
# simple_function 874.411 968.323 1206.037 1123.515 1325.075 2957.931 100 
# igraph_option 1239.896 1451.364 1802.341 1721.227 1984.380 3907.089 100 
+0

非常感谢你们三位给出的答案,对于理解我需要的算法和igraph软件包都非常有用。我将花时间了解您提供的所有解决方案! – antuki

1

这个工作对我来说:

library(igraph) 
g <- graph_from_literal(A1--A2, A1--A3, A2--A4, A3--A4, A4--A5, B1--B3, B2--B3) 
sg_a2 <- subcomponent(g, 'A2', 'in') 
as_data_frame(subgraph.edges(g, sg_a2)) 

它提供:

# from to 
#1 A1 A2 
#2 A1 A3 
#3 A2 A4 
#4 A3 A4 
#5 A4 A5 
+0

谢谢你们三位的答案,对于理解算法非常有用我需要和igraph软件包。我将花时间了解您提供的所有解决方案! – antuki

2

你可以尝试和过滤只有连接到A2的节点(即距离不等于Inf

library(tidygraph) 
edges <- data.frame(from = c("A1","A1","A2","A3","A4","B1","B2"), 
        to = c("A2","A3","A4","A4","A5","B3","B3")) 
as_tbl_graph(edges) %>% 
    filter(is.finite(node_distance_to(name=="A2", mode="all"))) 

哪给出

# A tbl_graph: 5 nodes and 5 edges 
# 
# A directed acyclic simple graph with 1 component 
# 
# Node Data: 5 x 1 (active) 
    name 
    <chr> 
1 A1 
2 A2 
3 A3 
4 A4 
5 A5 
# 
# Edge Data: 5 x 2 
    from to 
    <int> <int> 
1  1  2 
2  1  3 
3  2  4 
# ... with 2 more rows 
+0

谢谢你们三个人的回答,这对我理解我需要的算法和igraph软件包非常有用。我将花时间了解您提供的所有解决方案! – antuki