2016-04-15 95 views
0

我有两个数据集Transaction_long和Transaction_short。在数据集中,Transaction_long具有许多带有购买点(用true表示)的策略和价格报价。 Transaction_short只有购买点的条目。迭代两个数据集并将结果返回给一个数据集

我的目标是在Transaction_short数据集中添加一个名为Policy_Change_Frequency的列。对于短数据集中的每个客户,在长数据集中遍历该客户的行并计算策略更改的时间。

要了解政策的变化,我可以使用sum(diff(Transaction_Long$policy)!=0)但不知道如何来遍历这两个数据集,并得到结果

详情:

Customer_Name : name of customer 
Customer_ID: Customer Identifier number 
Purchase: Boolean variable (Yes-1,No-0) 
Policy: Categorical (takes values 1-5) 
Price : Price quoted 

数据集1 - Transaction_Long

Customer_Name,Customer_ID,Purchased,Policy,Price 
Joe,101,0,1,500 
Joe,101,0,1,505 
Joe,101,0,2,510 
Joe,101,0,2,504 
Joe,101,0,2,507 
Joe,101,0,1,505 
Joe,101,1,3,501 
Mary,103,0,1,675 
Mary,103,0,3,650 
Mary,103,0,2,620 
Mary,103,0,2,624 
Mary,103,0,2,630 
Mary,103,1,2,627 

数据集2Transaction_Short

Customer_Name , Customer_ID,Purchased,Policy, Price 
Joe,101,1,3,501 
Mary,103,1,2,627 

需要在交易短期数据集添加策略更改频率栏,所以我最终Transcation短数据集的样子

最终数据集应该是这样的

Customer_Name , Customer_ID,Purchased, Policy, Price,Policy_ChangeFreq 
Joe,101,1,3,501,3 
Mary,103,1,2,627,2 

回答

0

得到它使用sqldf工作包中的R

for (i in 1:nrow(Transaction_short)){ 
    sql <- sprintf("SELECT policy from Transaction_long where customer_ID = %s",ML_Train_short$customer_ID[i]) 
    df<- sqldf(sql) 
    NF <- sum(df$policy[-1]!= df$policy[-length(df$policy)]) 
    ML_Train_short$Policy_Change_Freq[i] <- NF 
    } 

,因为我已经在长期数据集约500K行和在短期dataset..this约100K正在采取while..is有,做任何其它溶液不需要循环?