2017-04-13 49 views
2

我的最终游戏是使用D3js从分层JSON文件创建树形图。R使用JSONLITE分层JSON?

我需要表示的等级是这张图,其中A有孩子B,C,D; B有孩子E,F,G; C有孩子H,I; D没有孩子。节点将具有多个键:值对。为简单起见,我仅列出3个。 R中

       -- name:E 
          | type:dkBlue 
          | id: 005 
          | 
          |-- name:F 
      -- name:B ------| type:medBlue 
      | type:blue | id: 006 
      | id:002  | 
      |    |-- name:G 
      |     type:ltBlue 
name:A ----|     id:007  
type:colors| 
id:001  |-- name:C ----|-- name:H 
      | type:red | type:dkRed   
      | id:003  | id:008 
      |    | 
      |    | 
      |    |-- name:I 
      |     type:medRed 
      |     id:009 
      |-- name:D 
       type:green 
       id: 004 

我的源数据是这样的:

nodes <-read.table(header = TRUE, text = " 
ID name type 
001 A colors 
002 B blue 
003 C red 
004 D green 
005 E dkBlue 
006 F medBlue 
007 G ltBlue 
008 H dkRed 
009 I medRed 
") 

links <- read.table(header = TRUE, text = " 
startID relation endID  
001  hasSubCat 002 
001  hasSubCat 003 
001  hasSubCat 004 
002  hasSubCat 005 
002  hasSubCat 006 
002  hasSubCat 007 
003  hasSubCat 008 
003  hasSubCat 009 
") 

我必须把它转换成以下JSON:

{"name": "A", 
"type": "colors", 
"id" : "001", 
"children": [ 
    {"name": "B", 
     "type": "blue", 
     "id" : "002", 
     "children": [ 
      {"name": "E", 
      "type": "dkBlue", 
      "id" : "003"}, 
      {"name": "F", 
      "type": "medBlue", 
      "id": "004"}, 
      {"name": "G", 
      "type": "ltBlue", 
      "id": "005"} 
    ]}, 
    {"name": "C", 
     "type": "red", 
     "id" : "006", 
     "children": [ 
      {"name": "H", 
      "type": "dkRed", 
      "id" : "007"}, 
      {"name": "I", 
      "type": "dkBlue", 
      "id": "008"} 
    ]}, 
    {"name": "D", 
     "type": "green", 
     "id" : "009"} 
]} 

我希望你可以提供任何帮助!

[更新2017年4月18日]

基于伊恩的引用我看着成R的data.tree。如果我重构我的数据,我可以重新创建我的层次结构,如下所示。请注意,我已经失去了每个节点之间的关系类型(hasSubcat),其值在现实生活中对于每个链接/边缘都会有所不同。如果我能得到可行的层次结构,我愿意放手(现在)。对于data.tree修订后的数据:

df <-read.table(header = TRUE, text = " 
paths type  id 
A  colors 001 
A/B blue  002 
A/B/E dkBlue 005 
A/B/F medBlue 006 
A/B/G ltBlue 007 
A/C red  003 
A/C/H dkRed 008 
A/C/I medRed 009 
A/D green 004 
") 

myPaths <- as.Node(df, pathName = "paths") 
myPaths$leafCount/(myPaths$totalCount - myPaths$leafCount) 
print(myPaths, "type", "id", limit = 25) 

打印显示我在原来的职位勾勒出层次,甚至包含键:对每个节点的值。太好了!

levelName type id 
1 A   colors 1 
2 ¦--B  blue 2 
3 ¦ ¦--E dkBlue 5 
4 ¦ ¦--F medBlue 6 
5 ¦ °--G ltBlue 7 
6 ¦--C   red 3 
7 ¦ ¦--H dkRed 8 
8 ¦ °--I medRed 9 
9 °--D  green 4 

再一次,我对如何将这个从树转换为嵌套的JSON感到遗憾。这里的示例https://ipub.com/data-tree-to-networkd3/与大多数示例一样,仅在叶节点上使用键:值对,而不在分支节点上使用。我认为答案是创建一个嵌套的列表来提供给JSONIO或JSONLITE,我不知道该怎么做。

+1

你可能想看看这个:http://stackoverflow.com/questions/12818864/how-to-write-to-json-with-children-from-r –

+0

嗨伊恩,你举的例子让我接近,但我正在努力使它适应于我为树中每个“节点”所需的Key:Value对的点。该示例中的递归方法仅为终端节点提供了键值对。 – Tim

+0

蒂姆,你的问题很复杂,我需要破解一下,不幸的是我现在没有时间。有人比我更擅长解决问题的速度。如果您在递归方法中遇到问题,另一个选择是从顶部向下构建一棵树,这个树更容易概念化。这里是data.tree包的参赛者:https://cran.r-project.org/web/packages/data.tree/vignettes/data.tree.html。您可以添加每个孩子,然后按名称为每个孩子添加属性。然后,您可以使用以下内容将这些导出到JSON: –

回答

1

data.tree是非常有用的,可能是更好的方式来实现您的目标。为了好玩,我将提交一个更迂回的方式来实现使用igraphd3r嵌套JSON

nodes <-read.table(header = TRUE, text = " 
ID name type 
001 A colors 
002 B blue 
003 C red 
004 D green 
005 E dkBlue 
006 F medBlue 
007 G ltBlue 
008 H dkRed 
009 I medRed 
") 

links <- read.table(header = TRUE, text = " 
startID relation endID  
001  hasSubCat 002 
001  hasSubCat 003 
001  hasSubCat 004 
002  hasSubCat 005 
002  hasSubCat 006 
002  hasSubCat 007 
003  hasSubCat 008 
003  hasSubCat 009 
") 

library(d3r) 
library(dplyr) 
library(igraph) 

# make it an igraph 
gf <- graph_from_data_frame(links[,c(1,3,2)],vertices = nodes) 

# if we know that this is a tree with root as "A" 
# we can do something like this 
df_tree <- dplyr::bind_rows(
    lapply(
    all_shortest_paths(gf,from="A")$res, 
    function(x){data.frame(t(names(unclass(x))), stringsAsFactors=FALSE)} 
) 
) 

# we can discard the first column 
df_tree <- df_tree[,-1] 
# then make df_tree[1,1] as 1 (A) 
df_tree[1,1] <- "A" 

# now add node attributes to our data.frame 
df_tree <- df_tree %>% 
    # let's get the last non-NA in each row so we can join with nodes 
    mutate(
    last_non_na = apply(df_tree, MARGIN=1, function(x){tail(na.exclude(x),1)}) 
) %>% 
    # now join with nodes 
    left_join(
    nodes, 
    by = c("last_non_na" = "name") 
) %>% 
    # now remove last_non_na column 
    select(-last_non_na) 

# use d3r to nest as we would like 
nested <- df_tree %>% 
    d3_nest(value_cols = c("ID", "type")) 
+0

这非常接近。一个小问题:d3_nest会生成一个名为“root”的预期根节点,产生一个启动的树:root - > A - > ...如果我指定d3_nest参数root =“A”,则只会重命名“root”到'A',产生:A - > A - > ...有没有办法让A作为根节点?在df_tree中一切看起来不错。 '#now添加属性'之前的' – Tim

+0

',你可以'df_tree < - df_tree [-1,] df_tree < - df_tree [, - 1]'然后使用'd3_nest(...,root =“A”) '但你会失去'A'的属性。 – timelyportfolio

+0

你也可以'嵌套<- df_tree %>% d3_nest(value_cols = c(“ID”,“type”),json = FALSE)'然后'd3_json(nested [1] $ children [[1]],strip = TRUE) ' – timelyportfolio

1

考虑正走在水平反复转换数据框列多嵌套列表:

library(jsonlite) 
... 
df2list <- function(i) as.vector(nodes[nodes$name == i,]) 

# GRANDPARENT LEVEL 
jsonlist <- as.list(nodes[nodes$name=='A',]) 
# PARENT LEVEL  
jsonlist$children <- lapply(c('B','C','D'), function(i) as.list(nodes[nodes$name == i,])) 
# CHILDREN LEVEL 
jsonlist$children[[1]]$children <- lapply(c('E','F','G'), df2list) 
jsonlist$children[[2]]$children <- lapply(c('H','I'), df2list) 

toJSON(jsonlist, pretty=TRUE) 

但是,使用这种方法,你会发现一个长度元素内部的一些儿童被封闭在括号。由于R在字符向量中不能包含复杂类型,因此整个对象必须是以括号形式输出的列表类型。

因此,考虑嵌套gsub额外的括号的清理仍呈现有效的JSON:

output <- toJSON(jsonlist, pretty=TRUE) 

gsub('"\\]\n', '"\n', gsub('"\\],\n', '",\n', gsub('": \\["', '": "', output))) 

最终输出

{ 
    "ID": "001", 
    "name": "A", 
    "type": "colors", 
    "children": [ 
    { 
     "ID": "002", 
     "name": "B", 
     "type": "blue", 
     "children": [ 
     { 
      "ID": "005", 
      "name": "E", 
      "type": "dkBlue" 
     }, 
     { 
      "ID": "006", 
      "name": "F", 
      "type": "medBlue" 
     }, 
     { 
      "ID": "007", 
      "name": "G", 
      "type": "ltBlue" 
     } 
     ] 
    }, 
    { 
     "ID": "003", 
     "name": "C", 
     "type": "red", 
     "children": [ 
     { 
      "ID": "008", 
      "name": "H", 
      "type": "dkRed" 
     }, 
     { 
      "ID": "009", 
      "name": "I", 
      "type": "medRed" 
     } 
     ] 
    }, 
    { 
     "ID": "004", 
     "name": "D", 
     "type": "green" 
    } 
    ] 
} 
+0

伟大的解决方案,但与更复杂的层次结构,这变得更加困难。 – timelyportfolio

1

一个不错的,如果有点难以绕到一个人的头上,这样做的方法是使用一个自引用函数,如下所示...

nodes <- read.table(header = TRUE, colClasses = "character", text = " 
ID name type 
001 A colors 
002 B blue 
003 C red 
004 D green 
005 E dkBlue 
006 F medBlue 
007 G ltBlue 
008 H dkRed 
009 I medRed 
") 

links <- read.table(header = TRUE, colClasses = "character", text = " 
startID relation endID  
001  hasSubCat 002 
001  hasSubCat 003 
001  hasSubCat 004 
002  hasSubCat 005 
002  hasSubCat 006 
002  hasSubCat 007 
003  hasSubCat 008 
003  hasSubCat 009 
") 

convert_hier <- function(linksDf, nodesDf, sourceId = "startID", 
         targetId = "endID", nodesID = "ID") { 
    makelist <- function(nodeid) { 
    child_ids <- linksDf[[targetId]][which(linksDf[[sourceId]] == nodeid)] 

    if (length(child_ids) == 0) 
     return(as.list(nodesDf[nodesDf[[nodesID]] == nodeid, ])) 

    c(as.list(nodesDf[nodesDf[[nodesID]] == nodeid, ]), 
     children = list(lapply(child_ids, makelist))) 
    } 

    ids <- unique(c(linksDf[[sourceId]], linksDf[[targetId]])) 
    rootid <- ids[! ids %in% linksDf[[targetId]]] 
    jsonlite::toJSON(makelist(rootid), pretty = T, auto_unbox = T) 
} 

convert_hier(links, nodes) 

几个音符...

  1. 我添加colClasses = "character"read.table命令,以便ID号不强制为整数,没有前导零,因此该字符串不会转换为因素。
  2. 我在convert_hier函数中包裹了所有的东西,使其更容易适应其他场景,但真正的魔力在于makelist函数。