2016-12-01 121 views
0

我有一张订单表和一张账户活动表。我正在寻找该订单上最近的帐户活动。我想遍历每个订单来查找与该账户相匹配的活动,以及与订单日期最近的日期。根据其他条件按日期查找最新记录

rec type  date   account nearest.rec 
1  Order  12/1/2016 A 
2  Order  11/14/2016 B 
3  Activity 11/13/2016 A 
4  Activity 10/15/2016 C 
5  Order  11/13/2016 C 
6  Activity 11/16/2016 A 
7  Activity 11/17/2016 A 
8  Activity 10/14/2016 B 
9  Activity 11/4/2016 B 

想将其转换成这样:

rec type  date   account nearest.rec.actv 
1  Order  12/1/2016 A   7 
2  Order  11/14/2016 B   9 
3  Activity 11/13/2016 A 
4  Activity 10/15/2016 C 
5  Order  11/13/2016 C   4 
6  Activity 11/16/2016 A 
7  Activity 11/17/2016 A 
8  Activity 10/14/2016 B 
9  Activity 11/4/2016 B 

或者变成自己的数据帧

rec type  date   account nearest.rec.actv actv.date 
1  Order  12/1/2016 A   7     11/17/2016 
2  Order  11/14/2016 B   9     11/4/2016 
5  Order  11/13/2016 C   4     10/15/2016 
+0

@nrussell的感谢!当它出错时,我只是试图修复它。 –

回答

1

分割由type的数据和account合并,然后总结

df$date <- as.Date(df$date, "%m/%d/%Y") 
ind <- df$type=="Order" 
df1 <- df[ind,] 
df2 <- df[!ind,] 
left_join(df1, df2, by="account") %>% 
    group_by(account) %>% 
    filter(date.x - date.y == min(date.x-date.y)) 

# rec.x type.x  date.x account rec.y type.y  date.y 
# <int> <chr>  <date> <chr> <int> <chr>  <date> 
#1  1 Order 2016-12-01  A  7 Activity 2016-11-17 
#2  2 Order 2016-11-14  B  9 Activity 2016-11-04 
#3  5 Order 2016-11-13  C  4 Activity 2016-10-15 
0

这是没有一个地方有效的答案,但单步执行,可能会帮助:

# subset into 2 dataframes 
df1 <- df[df$type == "Order",] 
df2 <- df[df$type == "Activity",] 

# basic logic in the mutate() is that get the time difference for each record in a account. find the minimum, and get the corresponding activity date and record 
df1 %>% group_by(account) %>% 
    mutate(x = df2$date[account==df2$account][which(min(difftime(date, df2$date[account == df2$account])) == difftime(date, df2$date[account == df2$account]))], 
     y = df2$rec[account==df2$account][which(min(difftime(date, df2$date[account == df2$account])) == difftime(date, df2$date[account == df2$account]))]) 

# rec type  date account   x  y 
# <int> <chr>  <date> <chr>  <date> <int> 
#1  1 Order 2016-12-01  A 2016-11-17  7 
#2  2 Order 2016-11-14  B 2016-11-04  9 
#3  5 Order 2016-11-13  C 2016-10-15  4 
+0

@MattW。它回答你的问题吗?注意遵循http://stackoverflow.com/help/someone-answers –

0

下面是使用dplyrpurrr我的解决方案:

library(dplyr) 
library(lubridate) 
library(purrr) 
library(tidyr) 

# Convert date column to Date format with lubridate function 
df <- df %>% mutate_at("date", mdy) 

# Get the date assigned to order type 
list_order <- df %>% 
    filter(type == "Order") %>% 
    select(date, account) %>% 
    spread(account, date) %>% 
    as.list() 

# Get the data frame filtered with Order type 
df_order <- filter(df, type == "Order") 

# For each account, check the closest date to the date stored in the list 
map_df(seq_along(df_order), function(i) { 
    df %>% 
    filter(type == "Activity", account == names(list_order[i])) %>% 
    filter(date - list_order[[i]] == max(as.numeric(date - list_order[[i]]))) 
}) %>% 
    select(nearest.rec.actv = rec, actv.date = date, account) %>% 
    inner_join(df_order) %>% 
    select(rec, type, date, nearest.rec.actv, actv.date)