2017-02-28 64 views
0

我无法获得此问题的解决方案。我有两个数据框。 DF1和DF2。如果DF1中的时间戳在DF2中指定的时间间隔内,我想将DF2的列合并到DF1。 这里是两个dataframes的例子:如果x的时间戳在y的时间间隔内,则合并两个数据帧

DF1 <- structure(list(Airspeed = c(582L, 478L, 524L), Outbound.Track = c(119L, 78L,134L), Rem.Ground.Dist = c(369L, 119L, 196L), Timestamp=structure(c(1451636817.52577, 1451638203.76569, 1451637753.43511),class = c("POSIXct", "POSIXt"), tzone = "")), .Names =c("Airspeed", "Outbound.Track","Rem.Ground.Dist", "Timestamp"), row.names =c(1L, 12L, 7L), class = c("data.table", "data.frame")) 

DF2 <- structure(list(Temperature = c(-18.5, -60, -35), Wind_Direction = c("324", "335", "313"), Wind_Speed = c("032", "041", "056"), onebef =structure(c(1451629620, 1451634660, 1451637000), class = c("POSIXct", "POSIXt"), tzone = ""), oneaft = structure(c(1451636820, 1451641860, 1451644200), class =c("POSIXct", "POSIXt"))), .Names = c("Temperature", "Wind_Direction", "Wind_Speed","onebef", "oneaft"), row.names = c(1358L, 1654L, 2068L), class = "data.frame") 

head(DF1) 
head(DF2) 

我想与DF2合并DF1。因此,如果匹配(DF1的时间戳在任何DF2的时间间隔内),则应将DF2(Wind_Speed,Wind_Direction,Temperature)的值添加到DF1。

两个问题,我面对:

  1. 如何做好匹配/合并吗?我的数据帧非常大(在DF1和DF2中有7000行)

  2. 如何确保DF1的行在有多个匹配的情况下是重复的?

我期待着您的帮助!谢谢

回答

2

你可以使用sqldf:

library(sqldf) 
df<-sqldf('select d1.*,d2.* 
      from DF1 d1 
      left join DF2 d2 
      on d1.Timestamp >= d2.onebef 
       AND d1.Timestamp <= d2.oneaft 
      ') 
df 
+0

或'...在d1.onebef和d2.oneaft之间的d1.Timestamp –

0

这将很好地工作的例子,但你可能会与真实数据的斗争,因为它会创建一个非常大的数据集(结合DF1的每一行与DF2)之前,它保持相关的行。

试试看看它是如何工作的。

library(dplyr) 

DF1 <- structure(list(Airspeed = c(582L, 478L, 524L), Outbound.Track = c(119L, 78L,134L), Rem.Ground.Dist = c(369L, 119L, 196L), Timestamp=structure(c(1451636817.52577, 1451638203.76569, 1451637753.43511),class = c("POSIXct", "POSIXt"), tzone = "")), .Names =c("Airspeed", "Outbound.Track","Rem.Ground.Dist", "Timestamp"), row.names =c(1L, 12L, 7L), class = c("data.table", "data.frame")) 

DF2 <- structure(list(Temperature = c(-18.5, -60, -35), Wind_Direction = c("324", "335", "313"), Wind_Speed = c("032", "041", "056"), onebef =structure(c(1451629620, 1451634660, 1451637000), class = c("POSIXct", "POSIXt"), tzone = ""), oneaft = structure(c(1451636820, 1451641860, 1451644200), class =c("POSIXct", "POSIXt"))), .Names = c("Temperature", "Wind_Direction", "Wind_Speed","onebef", "oneaft"), row.names = c(1358L, 1654L, 2068L), class = "data.frame") 


merge(DF1, DF2) %>%         # combine every row of DF1 with DF2 
    filter(onebef <= Timestamp & Timestamp <= oneaft) # keep rows where Timestampe is between the interval 


# Airspeed Outbound.Track Rem.Ground.Dist   Timestamp Temperature Wind_Direction Wind_Speed    onebef    oneaft 
# 1  582   119    369 2016-01-01 08:26:57  -18.5   324  032 2016-01-01 06:27:00 2016-01-01 08:27:00 
# 2  582   119    369 2016-01-01 08:26:57  -60.0   335  041 2016-01-01 07:51:00 2016-01-01 09:51:00 
# 3  478    78    119 2016-01-01 08:50:03  -60.0   335  041 2016-01-01 07:51:00 2016-01-01 09:51:00 
# 4  524   134    196 2016-01-01 08:42:33  -60.0   335  041 2016-01-01 07:51:00 2016-01-01 09:51:00 
# 5  478    78    119 2016-01-01 08:50:03  -35.0   313  056 2016-01-01 08:30:00 2016-01-01 10:30:00 
# 6  524   134    196 2016-01-01 08:42:33  -35.0   313  056 2016-01-01 08:30:00 2016-01-01 10:30:00 
+0

Unfortuantely我得到differening行数的错误。任何方法来解决这个错误? – Anna2803

+0

在哪一步你会得到那个错误?它是在真实的数据集上还是在示例上? – AntoniosK

+0

在真实数据集中。示例完美。 – Anna2803

1

您可以使用merge()all = TRUE选项的DF1所有行与所有行DF2结合起来。然后,你可以检查你的病情:

x <- merge(DF1, DF2, all = TRUE) 

x[x$Timestamp >= x$onebef & x$Timestamp <= x$oneaft,] 

    Airspeed Outbound.Track Rem.Ground.Dist   Timestamp Temperature Wind_Direction Wind_Speed    onebef 
1  582   119    369 2016-01-01 09:26:57  -18.5   324  032 2016-01-01 07:27:00 
4  582   119    369 2016-01-01 09:26:57  -60.0   335  041 2016-01-01 08:51:00 
5  478    78    119 2016-01-01 09:50:03  -60.0   335  041 2016-01-01 08:51:00 
6  524   134    196 2016-01-01 09:42:33  -60.0   335  041 2016-01-01 08:51:00 
8  478    78    119 2016-01-01 09:50:03  -35.0   313  056 2016-01-01 09:30:00 
9  524   134    196 2016-01-01 09:42:33  -35.0   313  056 2016-01-01 09:30:00 
      oneaft 
1 2016-01-01 09:27:00 
4 2016-01-01 10:51:00 
5 2016-01-01 10:51:00 
6 2016-01-01 10:51:00 
8 2016-01-01 11:30:00 
9 2016-01-01 11:30:00 
相关问题