如果速度有问题,您可能需要使用data.table
或dplyr
。在这里,我修改了一些示例数据以提供一些想法。
df1 <- data.frame(Catcode = c("A1200", "B1500", "C1800"),
Catname = c("Sugar", "Salty", "Butter"),
Catorder = c("cane A01", "cane A01", "cane A01"),
Industry = c("Crop Production", "Crop Production", "Crop Production"),
Sector = c("Agribusiness", "Agribusiness", "Agribusiness"),
stringsAsFactors = FALSE)
# Catcode Catname Catorder Industry Sector
#1 A1200 Sugar cane A01 Crop Production Agribusiness
#2 B1500 Salty cane A01 Crop Production Agribusiness
#3 C1800 Butter cane A01 Crop Production Agribusiness
df2 <- data.frame(BusinessName = c("Sarah Farms", "Ben Farms"),
AmountDonated = c(100, 200),
Year = c(2010, 2010),
Category = c("A1200", "B1500"),
stringsAsFactors = FALSE)
# BusinessName AmountDonated Year Category
#1 Sarah Farms 100 2010 A1200
#2 Ben Farms 200 2010 B1500
library(dplyr)
library(data.table)
# 1) dplyr option
# Catcode C1800 will be dropped since it does not exist in both data frames.
inner_join(df1, df2, by = c("Catcode" = "Category"))
# Catcode Catname Catorder Industry Sector BusinessName AmountDonated Year
#1 A1200 Sugar cane A01 Crop Production Agribusiness Sarah Farms 100 2010
#2 B1500 Salty cane A01 Crop Production Agribusiness Ben Farms 200 2010
# Catcide C1800 remains
left_join(df1, df2, by = c("Catcode" = "Category"))
# Catcode Catname Catorder Industry Sector BusinessName AmountDonated Year
#1 A1200 Sugar cane A01 Crop Production Agribusiness Sarah Farms 100 2010
#2 B1500 Salty cane A01 Crop Production Agribusiness Ben Farms 200 2010
#3 C1800 Butter cane A01 Crop Production Agribusiness <NA> NA NA
# 2) data.table option
# Convert data.frame to data.table
setDT(df1)
setDT(df2)
#Set columns for merge
setkey(df1, "Catcode")
setkey(df2, "Category")
df1[df2]
# Catcode Catname Catorder Industry Sector BusinessName AmountDonated Year
#1: A1200 Sugar cane A01 Crop Production Agribusiness Sarah Farms 100 2010
#2: B1500 Salty cane A01 Crop Production Agribusiness Ben Farms 200 2010
df2[df1]
# BusinessName AmountDonated Year Category Catname Catorder Industry Sector
#1: Sarah Farms 100 2010 A1200 Sugar cane A01 Crop Production Agribusiness
#2: Ben Farms 200 2010 B1500 Salty cane A01 Crop Production Agribusiness
#3: NA NA NA C1800 Butter cane A01 Crop Production Agribusiness
尝试带'by.x'和'by.y'参数的'merge()'函数。另请参阅http://stackoverflow.com/q/5963269/946850以改善问题。 – krlmlr 2014-12-02 02:51:06