2016-03-28 79 views
0

假设以下数据只是我正在使用的非常大的数据的一部分。只有当数据帧列中的值与其他两个列值相匹配时,才替换其中的值

mydf<-data.frame(Date=as.Date(c("2015-01-01","2015-01-10","2015-01-27","2015-02-27","2015-03-15","2015-04-17","2015-04-18")),Expense=c(1566,5646,3456,6546,5313,6466,5456),Details=c('item101 xsda','fuel asa','item102a','fuel asa','fuel sda','fuel','item102a'),Vehicle=c('Car','Bike','Car','Car','Bike','Bike','Bike'),Person=c('John','Smith','Robin',rep(NA,3),'Robin')) 

Date   Expense  Details  Vehicle Person 
1 2015-01-01 1566  item101 xsda Car  John 
2 2015-01-10 5646  fuel asa  Bike  Smith 
3 2015-01-27 3456  item102a  Car  Robin 
4 2015-02-27 6546  fuel asa  Car  <NA> 
5 2015-03-15 5313  fuel sda  Bike  <NA> 
6 2015-04-17 6466  fuel   Bike  <NA> 
7 2015-04-18 5456  item102a  Bike  Robin 

有两点需要考虑

1)当车辆的车“使用的是和“燃料”被购买了约翰

2人),当车辆“自行车”是购买二手和“燃料”,那么这个人是史密斯

我期望的输出是

 Date  Expense Details  Vehicle Person 
1 2015-01-01 1566 item101 xsda  Car  John 
2 2015-01-10 5646 fuel    Bike  Smith 
3 2015-01-27 3456 item102a   Car  Robin 
4 2015-02-27 6546 fuel    Car  John 
5 2015-03-15 5313 fuel sda   Bike  Smith 
6 2015-04-17 6466 fuel    Bike  Smith 
7 2015-04-18 5456 item102a   Bike  Robin 

请告诉我如何解决这个问题? 我用下面的步骤和对解决方案

mydf$Details<-as.character(mydf$Details) 
mydf$Details[grepl('fuel',mydf$Details,ignore.case=TRUE)]<-'Fuel' 

是myDF

Date  Expense  Details  Vehicle Person 
1 2015-01-01 1566  item101 xsda Car  John 
2 2015-01-10 5646  Fuel   Bike  Smith 
3 2015-01-27 3456  item102a  Car  Robin 
4 2015-02-27 6546  Fuel   Car  <NA> 
5 2015-03-15 5313  Fuel   Bike  <NA> 
6 2015-04-17 6466  Fuel   Bike  <NA> 
7 2015-04-18 5456  item102a  Bike  Robin 

注达到了一半:如果可能的话,请避免环路。 如果有更好更快的这样做的方法,请分享

回答

1

你一半了,你说 尝试这两条线:使用data.table

mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Car'] <- 'John' 
mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Bike'] <- 'Smith' 
1

你可以在几行做:

library(data.table) 

setDT(mydf) 

mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Car", Person := "John"] 
mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Bike", Person := "Smith"] 

mydf 
#>   Date Expense  Details Vehicle Person 
#> 1: 2015-01-01 1566 item101 xsda  Car John 
#> 2: 2015-01-10 5646  fuel asa Bike Smith 
#> 3: 2015-01-27 3456  item102a  Car Robin 
#> 4: 2015-02-27 6546  fuel asa  Car John 
#> 5: 2015-03-15 5313  fuel sda Bike Smith 
#> 6: 2015-04-17 6466   fuel Bike Smith 
#> 7: 2015-04-18 5456  item102a Bike Robin 

使用dplyr,你也可以做条件变异,但代码更长。我使用stringr包进行字符串操作

library(dplyr) 
library(stringr) 
mydf %>% 
    mutate(
    Person = ifelse(
     is.na(Person) & 
     str_detect(Details, "fuel") & 
     Vehicle == "Car", 
     "John", 
     ifelse(
     is.na(Person) & 
      str_detect(Details, "fuel") & 
      Vehicle == "Bike", 
     "Smith", 
     as.character(Person))) 
) 
#>   Date Expense  Details Vehicle Person 
#> 1 2015-01-01 1566 item101 xsda  Car John 
#> 2 2015-01-10 5646  fuel asa Bike Smith 
#> 3 2015-01-27 3456  item102a  Car Robin 
#> 4 2015-02-27 6546  fuel asa  Car John 
#> 5 2015-03-15 5313  fuel sda Bike Smith 
#> 6 2015-04-17 6466   fuel Bike Smith 
#> 7 2015-04-18 5456  item102a Bike Robin 
+0

使用* data.table *可以更合适地使用join + update。 – Arun

+0

我不确定如何去做,然后... – cderv

相关问题