2017-10-11 143 views
2

我有两个data.frames如下:转换data.frame另一个data.frame

dt2017 = data.frame(id=LETTERS[1:5],year=2017,city1=c(0,1,0,1,0),city2=c(0,0,1,0,0),city3=c(1,0,1,0,1),city4=c(0,0,0,0,1)) 
dt2017 
id year city1 city2 city3 city4 
1: A 2017  0  0  1  0 
2: B 2017  1  0  0  0 
3: C 2017  0  1  1  0 
4: D 2017  1  0  0  0 
5: E 2017  0  0  1  1 

dt2016 = data.frame(id=LETTERS[1:5],year=2016,city1=c(0,0,0,0,1),city2=c(0,0,0,1,0),city3=c(0,0,1,0,1),city4=c(1,1,0,0,1)) 
dt2016 
    id year city1 city2 city3 city4| 
1: A 2016  0  0  0  1 
2: B 2016  0  0  0  1 
3: C 2016  0  0  1  0 
4: D 2016  0  1  0  0 
5: E 2016  1  0  1  1 

“1”,在data.frame可以代表这个城市工作。例如,2016年,A,B和E在同一城市工作4。首先,我想以下data.frame:

id 2016 2017 2016+2017 
1: A B;E C;E  B;C;E 
2: B A;E  D  A;D;E 
3: C  E A;E  A;E 
4: D  NA  B   B 
5: E A;B;C A;C  A;B;C 

其次,我希望得到一个data.frame这样的:

id  relation 
A   B 
A   C 
A   E 
B   A 
B   D 
B   E 
D   B 
E   A 
E   B 
E   C 

任何建议,将不胜感激。

+0

[拆分逗号分隔的柱到单独的行数](的可能的复制https://stackoverflow.com/questions/13773770/split-comma-separated -column-into-separate-rows) – Uwe

回答

1

我发现了一种方法来实现你想要的,但它不是很漂亮。它仍然是你想要的。

library(plyr) 

dt2017$id <- as.character(dt2017$id) 
dt2016$id <- as.character(dt2016$id) 
id <- dt2017$id 

my_function <- function(dt, x){ 
    tmp <- data.frame(id=id, dt[,dt[dt$id==x,]==1]) 
    tmp$ind <- sapply(1:nrow(tmp), function(x) return(sum(tmp[x, 2:ncol(tmp)]))) 
    return(paste(tmp[tmp$ind > 0 & tmp$id !=x,"id"], collapse=";")) 
} 

results1 <- data.frame(Year2016 = sapply(id, function(x) return(my_function(dt2016,x))), 
    Year2017 = sapply(id, function(x) return(my_function(dt2017,x)))) 


my_function2 <- function(dt, x){ 
    tmp <- data.frame(id=id, dt[,dt[dt$id==x,]==1]) 
    tmp$ind <- sapply(1:nrow(tmp), function(x) return(sum(tmp[x, 2:ncol(tmp)]))) 
    if(length(tmp[tmp$ind > 0 & tmp$id !=x,"id"])!=0){ 
     return(data.frame(id=x, relation=tmp[tmp$ind > 0 & tmp$id !=x,"id"])) 
    } 
} 


results_tmp <- rbind(adply(.data=id, .margins=1, .fun= function(x) return(my_function2(dt2016,x))), 
    adply(.data=id, .margins=1, .fun= function(x) return(my_function2(dt2017,x))))[, c("id", "relation")] 

results2 <- unique(results_tmp[order(as.character(results_tmp$id)),]) 



fun_tmp <- function(x) return(paste(x, collapse=";")) 
bothyear_tmp <- aggregate(list(relation=results2$relation), by=list(id=results2 $"id"), FUN=fun_tmp) 
results1$BothYear <- bothyear_tmp[order(as.character(bothyear_tmp$id)),"relation"] 

而这里是结果:

results1 
Year2016 Year2017 BothYear 
A  B;E  C;E B;E;C 
B  A;E  D A;E;D 
C  E  A;E  E;A 
D     B  B 
E A;B;C  A;C A;B;C 

results2 
id relation 
A  B 
A  E 
A  C 
B  A 
B  E 
B  D 
C  E 
C  A 
D  B 
E  A 
E  B 
E  C 
+0

它运作良好。非常感谢。我想在r package sna或igraph中可能会有这样的功能,但是我找不到确切的功能。我希望有人能以更漂亮的方式做到这一点。 –

相关问题