2017-08-04 54 views
2

我想使用dplyr的left_join将值(“新”)从一个DF转移到另一个DF。dplyr:如何按名称选择连接列?

我该怎么做,如果我不知道密钥的名称,但只知道它是数据集中的第一个变量?

require("dplyr") 

testData1 <- data.frame(idvar=c(1,2,3), 
        b=c("a","b","c"), 
        c=c("i","ii","iii")) 

testData2 <- data.frame(identification=c(1,2), 
        b=c("a","b"), 
        c=c("i","NA"), 
        new=c("var1","var2")) 

# now do a left join to obtain values of the new variable in the old dataset 


(testResult1 <- left_join(testData1,testData2)) 
# var2 is not in the results because of the "NA" in testData2! 


(testResult2 <- left_join(testData1,testData2, 
         by=c("idvar"="identification"))) 
# works as expected! ... but we do not know the name of the idvar! 


(testResult3 <- left_join(testData1,testData2, 
         by=c(names(testData1)[1]=names(testData2)[1]))) 
# Error: unexpected '=' in: 
# "testResult3 <- left_join(testData1,testData2, 
#        by=c(names(testData1)[1]=" 
+0

这是一个相关的Q&A:https://stackoverflow.com/questions/28125816/r-standard-evalation-for- join-dplyr –

回答

2

您可以提前创建命名载体,然后加入如下:

join_by = colnames(testData2)[1] 
names(join_by)=colnames(testData1)[1] 
left_join(testData1,testData2, by=join_by) 

或一条线:

left_join(testData1,testData2, 
     by=structure(colnames(testData2)[1], names=colnames(testData1)[1])) 

或者作为由阿尔乔姆建议:

left_join(testData1,testData2, 
       by=setNames(colnames(testData2)[1], colnames(testData1)[1])) 

希望这个他LPS!

+0

考虑使用'setNames(a,b)'作为'structure(a,names = b)'的缩写。 –

+0

谢谢,补充说,作为一个选项。 setNames比这里的结构有什么优势? – Florian

+0

除了需要较少的输入外,'setNames'对长向量也更有效。 –

3

另一种方法是使这两个键列具有相同的名称:

left_join(
    testData1, 
    rename_at(testData2, 1, ~ names(testData1)[1]), 
    by = names(testData1)[1] 
) 

# idvar b.x c.x b.y c.y new 
# 1  1 a i a i var1 
# 2  2 b ii b NA var2 
# 3  3 c iii <NA> <NA> <NA> 

# > (testResult2 <- left_join(testData1,testData2, by=c("idvar"="identification"))) 
# idvar b.x c.x b.y c.y new 
# 1  1 a i a i var1 
# 2  2 b ii b NA var2 
# 3  3 c iii <NA> <NA> <NA>