2017-07-18 73 views
0

我有一个csv,其中包含如下所示的组织结构以及一些其他列。我用R来创建图表,它的效果很棒! 面临的挑战是如何为子集经理及其子女/孙子女创建图表。 在dplr或其他软件包中是否有可能进行过滤?R数据框中的父级子结构

Sample format: 
emp_id mgr_id nest_id 
A  A  0 
B  A  1 
C  B  2 
D  C  3 
    D1  D  4 
    D2  D  4 
E  C  3 
    E1  E  4 
F  C  3 
G  B  2 
H  G  3 

我需要该子集是用于管理 “C”

方案1:EMP_ID == C必须包含的 'd' 的所有节点, 'D1', 'D2', 'E' , 'E1', 'F'

预期结构:

manager,all_children 
C  D 
C  D1 
C  D2 
C  E 
C  E1 
C  F 

方案2:EMP_ID == C必须包含上述所有节点,但保留mgr_id结构关于 'd', 'E' 预期结构:

manager,all_children 
C  D 
C  E 
C  F 
D  D1 
D  D2 
E  E1 

回答

0

下面是使用功能的一个溶液从dplyrdata.tabledt3是场景1的输出,而DT4是输出用于方案2

# Load packages 
library(dplyr) 
library(data.table) 

# Create example data frame 
dt <- read.table(text = "emp_id mgr_id nest_id 
A  A  0 
       B  A  1 
       C  B  2 
       D  C  3 
       D1  D  4 
       D2  D  4 
       E  C  3 
       E1  E  4 
       F  C  3 
       G  B  2 
       H  G  3", 
       header = TRUE, stringsAsFactors = FALSE) 

# Process the data 
dt2 <- dt %>% 
    # Filter levels lower than 1 
    filter(nest_id > 1) %>% 
    mutate(group_id = ifelse(nest_id > 2, 0, 1)) %>% 
    # Create "run_id", which will be used to fill manager label 
    mutate(run_id = rleid(group_id)) %>% 
    mutate(run_id = ifelse(run_id %% 2 == 0, run_id - 1, run_id)) %>% 
    group_by(run_id) %>% 
    mutate(manager = first(emp_id)) %>% 
    # Select for manager C 
    filter(manager %in% "C") %>% 
    ungroup() %>% 
    # Remove rows if manager == emp_id 
    filter(manager != emp_id) %>% 
    rename(all_children = emp_id) 

# Scenario 1 
dt3 <- dt2 %>% select(manager, all_children) 

# Scenario 2 
dt4 <- dt2 %>% 
    select(manager = mgr_id, all_children) %>% 
    arrange(manager, all_children) 
+0

谢谢你们的回应,并且都对我提供的虚拟数据以及我拥有的演示csv有很大帮助。让我对真实数据执行此操作,并尽快提供更新。 – Vinay

0

by考虑基础包这对于mgr_id(不仅仅是C)每一级创建DF列表:

方案1

dfList <- by(df, df$mgr_id, function(i){ 
    names(i) <- paste0(names(i), "_")  # SUFFIX UNDERSCORE (TO AVOID DUP COLUMNS) 

    child <- merge(i, df, by.x="mgr_id_", by.y="emp_id")[,1:2] 
    grandchild <- merge(child, df, by.x="emp_id_", by.y="mgr_id")[c("mgr_id_", "emp_id")] 

    names(child) <- gsub("*_$", "", names(child))    # REMOVE LAST UNDERSCORE 
    names(grandchild) <- gsub("*_$", "", names(grandchild)) # REMOVE LAST UNDERSCORE 

    rbind(child, grandchild) 
}) 

dfList$C 

# mgr_id emp_id 
# 1  C  D 
# 2  C  E 
# 3  C  F 
# 4  C  D1 
# 5  C  D2 
# 6  C  E1 

方案2 (其中所选择的列孙子改变,然后第一列的重命名)

dfList <- by(df, df$mgr_id, function(i){ 
    names(i) <- paste0(names(i), "_")  # SUFFIX UNDERSCORE (TO AVOID DUP COLUMNS) 

    child <- merge(i, df, by.x="mgr_id_", by.y="emp_id")[,1:2] 
    grandchild <- merge(child, df, by.x="emp_id_", by.y="mgr_id")[c("emp_id_", "emp_id")] 

    names(child) <- gsub("*_$", "", names(child))    # REMOVE LAST UNDERSCORE 
    names(grandchild) <- gsub(".*_$", "", names(grandchild)) # REMOVE LAST UNDERSCORE 
    names(grandchild)[1] <- "mgr_id" 

    rbind(child, grandchild) 
}) 

dfList$C 

# mgr_id emp_id 
# 1  C  D 
# 2  C  E 
# 3  C  F 
# 4  D  D1 
# 5  D  D2 
# 6  E  E1