2013-03-25 68 views
2

我正在从SAS迁移到R.我需要帮助了解如何总结日期范围的天气数据。在SAS中,我使用日期范围,使用数据步骤为范围内的每个日期创建记录(使用startdate,enddate,date),合并天气然后汇总(VAR hdd cdd; CLASS = startdate enddate sum =)总结日期范围的值。使用一个数据帧对R中另一个数据帧的数据范围求和

R代码里面:

startdate <- c(100,103,107) 
enddate <- c(105,104,110) 
billperiods <-data.frame(startdate,enddate); 

获得:

> billperiods 
startdate enddate 
1  100  105 
2  103  104 
3  107  110 

R代码里面:

weatherdate <- c(100:103,105:110) 
hdd <- c(0,0,4,5,0,0,3,1,9,0) 
cdd <- c(4,1,0,0,5,6,0,0,0,10) 
weather <- data.frame(weatherdate,hdd,cdd) 

获得:

> weather 
    weatherdate hdd cdd 
1   100 0 4 
2   101 0 1 
3   102 4 0 
4   103 5 0 
5   105 0 5 
6   106 0 6 
7   107 3 0 
8   108 1 0 
9   109 9 0 
10   110 0 10 

注:缺失weatherdate = 104。我可能一天都没有天气。

我无法弄清楚如何获得:

> billweather 
    startdate enddate sumhdd sumcdd 
1  100  105  9  10 
2  103  104  5  0 
3  107  110  13  10 

其中sumhdd是的总和hdd的从startdateenddate天气data.frame

任何想法?

回答

1
billweather <- cbind(billperiods, 
       t(apply(billperiods, 1, function(x) { 
        colSums(weather[weather[, 1] %in% c(x[1]:x[2]), 2:3]) 
       }))) 
+0

感谢您的快速响应!我试图对付更大的数据帧(12356行),它花了7.89秒,结果很好!我很惊讶人们回应的速度有多快。这是我第一次在这里问一个问题。 – 2013-03-25 22:08:47

1
cbind(billperiods, t(sapply(apply(billperiods, 1, function(x) 
    weather[weather$weatherdate >= x[1] & 
      weather$weatherdate <= x[2], c("hdd", "cdd")]), colSums))) 

    startdate enddate hdd cdd 
1  100  105 9 10 
2  103  104 5 0 
3  107  110 13 10 
+0

感谢您的快速响应!我试图对付更大的数据帧(12,356行),花了6.75秒,结果很好! – 2013-03-25 22:08:19

3

下面是使用IRangesdata.table的方法。看起来,对于这个问题,这个答案似乎有点矫枉过正。但总的来说,我发现使用IRanges来处理间隔很方便,它们可能很简单。对于最后一行

# load packages 
require(IRanges) 
require(data.table) 

# convert data.frames to data.tables 
dt1 <- data.table(billperiods) 
dt2 <- data.table(weather) 

# construct Ranges to get overlaps 
ir1 <- IRanges(dt1$startdate, dt1$enddate) 
ir2 <- IRanges(dt2$weatherdate, width=1) # start = end 

# find Overlaps 
olaps <- findOverlaps(ir1, ir2) 

# Hits of length 10 
# queryLength: 3 
# subjectLength: 10 
# queryHits subjectHits 
#  <integer> <integer> 
# 1   1   1 
# 2   1   2 
# 3   1   3 
# 4   1   4 
# 5   1   5 
# 6   2   4 
# 7   3   7 
# 8   3   8 
# 9   3   9 
# 10   3   10 

# get billweather (final output) 
billweather <- cbind(dt1[queryHits(olaps)], 
       dt2[subjectHits(olaps), 
       list(hdd, cdd)])[, list(sumhdd = sum(hdd), 
       sumcdd = sum(cdd)), 
       by=list(startdate, enddate)] 

# startdate enddate sumhdd sumcdd 
# 1:  100  105  9  10 
# 2:  103  104  5  0 
# 3:  107  110  13  10 

代码故障:首先,我用建queryHitssubjectHitscbind一个中途data.table从中然后,我按startdate, enddate并获得hddcdd之和相加。为了更好地理解,如下所示单独查看线路更容易。

# split for easier understanding 
billweather <- cbind(dt1[queryHits(olaps)], 
      dt2[subjectHits(olaps), 
      list(hdd, cdd)]) 
billweather <- billweather[, list(sumhdd = sum(hdd), 
      sumcdd = sum(cdd)), 
      by=list(startdate, enddate)] 
相关问题