与合并/绑定/连接两个数据帧的R问题

我是R的初学者，所以如果问题在其他地方提出，我提前道歉。这是我的问题：与合并/绑定/连接两个数据帧的R问题

我有两个数据帧，df1和df2，具有不同数量的行和列。这两个框架只有一个共同称为“customer_no”的变量（列）。我希望合并框架仅匹配基于“customer_no”的记录，并仅匹配df2中的行。两个data.frames对于每个customer_no都有多行。

我试过如下：

merged.df <- (df1, df2, by="customer_no",all.y=TRUE)

的问题是，这种分配DF1的值DF2地方，而不是它应该是空的。我的问题是：

1）如何告诉命令将不匹配的列留空？ 2）如何从合并文件中看到哪个行来自哪个df？我猜如果我解决上述问题，这应该很容易看到空列。

我错过了我的命令，但不知道是什么。如果问题已在其他地方得到解答，那么您是否还适合用R语言在英语中重新翻译它？

谢谢！

数据例如：

df1: 
customer_no country year 
    10   UK  2001 
    10   UK  2002 
    10   UK  2003 
    20   US  2007 
    30   AU  2006 


df2:   
customer_no income 
    10   700 
    10   800 
    10   900 
    30   1000

合并后的文件应该是这样的：

merged.df: 
customer_no income country year 
    10     UK  2001 
    10     UK  2002 
    10     UK  2003 
    10   700 
    10   800 
    10   900 
    30     AU  2006 
    30   1000

所以：它把列一起，它的最后一个右后增加的DF2值基于相同的customer_no的df1并且仅匹配来自df2的customer_no（merged.df没有customer_no 20）。另外，它会留下所有其他单元。

在STATA中，我使用append但不确定在R ...也许加入？

谢谢！

来源

2014-10-08 Billaus

添加的数据。希望它足够清楚......感谢您的帮助！ – Billaus 2014-10-08 14:16:10

这看起来更像一个合并/加入，是否有美国入境退出的原因？ – DMT 2014-10-08 14:22:10

DMT，是的原因是因为它不在df2中。合并的df排除仅在df1中的值（不在df2中）。 – Billaus 2014-10-08 14:27:38

尝试：

df1$id <- paste(df1$customer_no, 1, sep="_") 
df2$id <- paste(df2$customer_no, 2, sep="_") 

res <- merge(df1, df2, by=c('id', 'customer_no'),all=TRUE)[,-1] 
res1 <- res[res$customer_no %in% df2$customer_no,] 
res1 
# customer_no country year income 
#1   10  UK 2001  NA 
#2   10  UK 2002  NA 
#3   10  UK 2003  NA 
#4   10 <NA> NA 700 
#5   10 <NA> NA 800 
#6   10 <NA> NA 900 
#8   30  AU 2006  NA 
#9   30 <NA> NA 1000

如果你想改变NA到''，

res1[is.na(res1)] <- '' #But, I would leave it as `NA` as there are `numeric` columns.

或者，使用rbindlist从data.table（使用原来的数据集）

library(data.table) 
indx <- df1$customer_no %in% df2$customer_no 
rbindlist(list(df1[indx,], df2),fill=TRUE)[order(customer_no)] 

# customer_no country year income 
#1:   10  UK 2001  NA 
#2:   10  UK 2002  NA 
#3:   10  UK 2003  NA 
#4:   10  NA NA 700 
#5:   10  NA NA 800 
#6:   10  NA NA 900 
#7:   30  AU 2006  NA 
#8:   30  NA NA 1000

来源

2014-10-08 14:36:24 akrun

太棒了！谢谢！！这真是一场噩梦......这样的解脱！:)）） – Billaus 2014-10-08 15:02:23

@Billaus没问题。很高兴帮助。 – akrun 2014-10-08 15:02:51

你可以也可以使用smartbind的功能gtools包。

require(gtools) 
res <- smartbind(df1[df1$customer_no %in% df2$customer_no, ], df2) 
res[order(res$customer_no), ] 
#  customer_no country year income 
# 1:1   10  UK 2001  NA 
# 1:2   10  UK 2002  NA 
# 1:3   10  UK 2003  NA 
# 2:1   10 <NA> NA 700 
# 2:2   10 <NA> NA 800 
# 2:3   10 <NA> NA 900 
# 1:4   30  AU 2006  NA 
# 2:4   30 <NA> NA 1000

来源

2014-10-08 14:46:37 shadow

这也适用！谢谢！！ – Billaus 2014-10-08 15:02:39

尝试：

df1$income = df2$country = df2$year = NA 
rbind(df1, df2) 
    customer_no country year income 
1   10  UK 2001  NA 
2   10  UK 2002  NA 
3   10  UK 2003  NA 
4   20  US 2007  NA 
5   30  AU 2006  NA 
6   10 <NA> NA 700 
7   10 <NA> NA 800 
8   10 <NA> NA 900 
9   30 <NA> NA 1000

来源

2014-10-08 15:12:37 rnso

与合并/绑定/连接两个数据帧的R问题

回答

相关问题