这需要公共列名的列表,匹配基于所有这些列组合的agrep
,然后如果all.x
或all.y
等于TRUE其追加的非匹配记录与NA缺少的列填充。与merge
不同,需要在每个数据帧中匹配的列名相同。挑战似乎是正确设置agrep
选项以避免虚假匹配。
agrepMerge <- function(df1, df2, by, all.x = FALSE, all.y = FALSE,
ignore.case = FALSE, value = FALSE, max.distance = 0.1, useBytes = FALSE) {
df1$index <- apply(df1[,by, drop = FALSE], 1, paste, sep = "", collapse = "")
df2$index <- apply(df2[,by, drop = FALSE], 1, paste, sep = "", collapse = "")
matches <- lapply(seq_along(df1$index), function(i, ...) {
agrep(df1$index[i], df2$index, ignore.case = ignore.case, value = value,
max.distance = max.distance, useBytes = useBytes)
})
df1_match <- rep(1:nrow(df1), sapply(matches, length))
df2_match <- unlist(matches)
df1_hits <- df1[df1_match,]
df2_hits <- df2[df2_match,]
df1_miss <- df1[setdiff(seq_along(df1$index), df1_match),]
df2_miss <- df2[setdiff(seq_along(df2$index), df2_match),]
remove_cols <- colnames(df2_hits) %in% colnames(df1_hits)
df_out <- cbind(df1_hits, df2_hits[,!remove_cols])
if(all.x) {
missing_cols <- setdiff(colnames(df_out), colnames(df1_miss))
df1_miss[missing_cols] <- NA
df_out <- rbind(df_out, df1_miss)
}
if(all.x) {
missing_cols <- setdiff(colnames(df_out), colnames(df2_miss))
df2_miss[missing_cols] <- NA
df_out <- rbind(df_out, df2_miss)
}
df_out[,setdiff(colnames(df_out), "index")]
}
你能否提供一小部分数据(或提供给我们一些假数据)? –
@RomanLuštrik虽然这本来不是我的问题,但我有类似的问题,创建了一些示例数据,并提供了奖励。 –
@David你有没有试过merge(sites_a,sites_b,by = c(“lon”,“lat”))'?在你的情况下,如果你想按名称合并,你将不得不投入更多的精力来使两个data.frames中的名字匹配(祝你好运,呵呵)。在示例中为 –