2016-04-03 69 views
1

我已经阅读了很多类似于这个的问题,但没有一个类似于我的答案。我很抱歉,如果这是多余的,我只是看不到它。用另一个数据框填充NAs,两个id变量

我有一个主数据集和一个备份数据集。当主用户有NA时,我想查看备份,如果有与full.place.name和Year匹配的值,我想用该值替换NA。

primary

Year Firearm.Homicide Firearm.Suicide Firearm.Unintentional full.place.name 
2010    0   <NA>      0 Adair County, KY 
2010    10    19     <NA> Adams County, CO 

backup

Year Firearm.Homicide Firearm.Suicide Firearm.Unintentional full.place.name 
2010    NA    1      1 Adair County, KY 
2010    NA    NA      0 Adams County, CO 

我要的是

Year Firearm.Homicide Firearm.Suicide Firearm.Unintentional full.place.name 
2010    0    1      0 Adair County, KY 
2010    10    19      0 Adams County, CO 

我已经试过

library(data.table) 
setDT(primary); setDT(backup) 
primary[is.na(primary$Firearm.Homicide), primary$Firearm.Homicide := backup[backup, primary$Firearm.Homicide, on=c("Year", "full.place.name")]] 

但是,最后添加了五列,并没有得到任何正确的值。我也尝试了ifelse语句和FillIn,我从来没有接近过。这里有五行数据:

primary<-structure(list(Year = c(2010, 2010, 2010, 2010, 2010), 
       Firearm.Homicide = c("0","10", "4", "3", NA), Firearm.Suicide = c(NA,"19", "5", "6", 
       NA), Firearm.Unintentional = c("0", NA, NA, "0", "0"), full.place.name = c("Adair County, KY", 
       "Adams County, CO", "Adams County, MS", "Adams County, PA", "Adams County, WI" 
      )), .Names = c("Year", "Firearm.Homicide", "Firearm.Suicide", 
       "Firearm.Unintentional", "full.place.name"), row.names = c(NA, 
       5L), class = "data.frame") 

backup<-structure(list(Year = c(2010, 2010, 2010, 2010, 2010), Firearm.Homicide = c(NA, 
      NA, 4, 3, 3), Firearm.Suicide = c(1, NA, NA, NA, NA), Firearm.Unintentional = c(1, 
      0, 1, NA, NA), full.place.name = c("Adair County, KY", "Adams County, CO", 
      "Adams County, MS", "Adams County, PA", "Adams County, WI")), .Names = c("Year", 
      "Firearm.Homicide", "Firearm.Suicide", "Firearm.Unintentional", 
      "full.place.name"), row.names = c(NA, 5L), class = "data.frame") 

我真的很感谢任何帮助!

回答

2

如果两个数据帧总是与指定的结构相同,那么有一个直接的解决方案。你可以这样做: primary[is.na(primary)] <- backup[is.na(primary)]如果表中的元素已经事先映射到彼此。这是一种使用dplyr包假设您的键列是“Year”和“full.place.name”来排序数据。

library(dplyr) primary <- arrange(primary, Year, full.place.name) %>% select(Year, Firearm.Homicide,Firearm.Suicide, Firearm.Unintentional, full.place.name) backup <- arrange(backup, Year, full.place.name) %>% select(Year, Firearm.Homicide, Firearm.Suicide, Firearm.Unintentional, full.place.name)

它可能不是这样做的最佳方式,但它很容易理解。

+0

他们不是互相映射现在,我怎么能做到这一点? – user5457414

+0

您可以首先按键列对两个数据框进行排序,具体取决于它们是什么,我猜这里应该是“Year”和“full.place.name”? – Psidom

0

data.table的一个选项将使用set。 “主”中的“火器”列为character类,而“备份”中的相应列为numeric。因此,我们需要将“主”中的那些列的class更改为numeric,然后将“主”中的“枪支”列中的NA值分配给“备份”中的相应值。

加入on后,我们可以遍历“火器”列,将列转换为“数字”,将“NA”替换为“i”列中的相应值,最后将“i”列为NULL。

#joining step 
dt <- setDT(primary)[backup, on = c("Year", "full.place.name")] 
#identify the Firearm columns with `grep` 
nm1 <- grep("^Firearm", names(primary), value=TRUE) 
#create a corresponding "i." column names vector from nm1 
nm2 <- paste0("i.", nm1) 
#loop through the columns 
for(j in seq_along(nm1)){ 
    #convert the Firearm columns from primary to `numeric` 
    set(dt, i = NULL, j= nm1[j], value = as.numeric(dt[[nm1[j]]])) 
    #replace the NA with corresponding values from "i" columns 
    set(dt, i = which(is.na(dt[[nm1[j]]])), j = nm1[j], 
     value = dt[[nm2[j]]][is.na(dt[[nm1[j]]])]) 
    #remove the i columns by assigning it to NULL 
    set(dt, i = NULL, j= nm2[j], value = NULL) 
} 


dt 
# Year Firearm.Homicide Firearm.Suicide Firearm.Unintentional full.place.name 
#1: 2010    0    1      0 Adair County, KY 
#2: 2010    10    19      0 Adams County, CO 
#3: 2010    4    5      1 Adams County, MS 
#4: 2010    3    6      0 Adams County, PA 
#5: 2010    3    NA      0 Adams County, WI 
0

假设你的数据集进行排序相同,所有的名称是相同的(根据你的榜样),然后

primary[is.na(primary)] <- backup[is.na(primary)] 
primary 
# Year Firearm.Homicide Firearm.Suicide Firearm.Unintentional full.place.name 
#1 2010    0    1      0 Adair County, KY 
#2 2010    10    19      0 Adams County, CO 
#3 2010    4    5      1 Adams County, MS 
#4 2010    3    6      0 Adams County, PA 
#5 2010    3   <NA>      0 Adams County, WI 
相关问题