2013-04-10 52 views
3

添加缺少次行我有一个大的数据集,看起来像:我想填补缺失的时间步长(YYYY-MM-DD HH:MM:SS)由R中

Time,Volume  
1996-02-05 00:34:00,0.01 
1996-02-05 00:51:00,0.01 
1996-02-05 00:52:00,0.01 
1996-02-05 01:04:00,0.01 
1996-02-05 01:19:00,0.01 
1996-02-05 05:00:00,0.01 
1996-02-05 05:07:00,0.01 
1996-02-05 05:08:00,0.01 
1996-02-05 05:14:00,0.01 

对每个30分钟的时间间隔总结Volume列。这是我已经试过:

z <- read.zoo("precip.csv", header = TRUE, sep = ",", FUN = as.chron) 
half_hour <- period.apply(z, endpoints(z, "minutes", 30), length) 

其中返回:

Time,Volume 
02/05/96 00:52:00,3 
02/05/96 01:19:00,2 
02/05/96 05:14:00,4 

我想要得到的输出看起来像:

Time,Volume 
02/05/96 00:29:00,0 
02/05/96 00:59:00,3 
02/05/96 01:29:00,2 
02/05/96 01:59:00,0 
02/05/96 02:29:00,0 
02/05/96 02:59:00,0 

...等等。

另外,我想如果我能在原始数据集,使每分钟是占补(其中失踪Volumes等于0),这是可行的。

我发现this post,但不能使它工作。

> z_xts<- xts(precip[,c("Volume")],precip[,"Time"]) 
Error in xts(precip[, c("Volume")], precip[, "Time"]) : 
    order.by requires an appropriate time-based object 

回答

1

这应该做你想要什么:

library(xts) 
x <- as.xts(read.zoo(text="Time,Volume  
1996-02-05 00:34:00,0.01 
1996-02-05 00:51:00,0.01 
1996-02-05 00:52:00,0.01 
1996-02-05 01:04:00,0.01 
1996-02-05 01:19:00,0.01 
1996-02-05 05:00:00,0.01 
1996-02-05 05:07:00,0.01 
1996-02-05 05:08:00,0.01 
1996-02-05 05:14:00,0.01", 
sep=",", FUN=as.POSIXct, header=TRUE, drop=FALSE)) 

# 1) Create POSIXct sequence from midnight of the first day 
# until the end of the last day  
midnightDay1 <- as.POSIXct(format(start(x),"%Y-%m-%d")) 
timesteps <- seq(midnightDay1, end(x), by="30 min") 
# 2) Make a copy of your object and set all values for Volume to 1 
y <- x 
y$Volume <- 1 
# 3) Merge the copy with a zero-column xts object that has an index 
# with all the values you want. Fill missing values with 0. 
m <- merge(y, xts(,timesteps), fill=0) 
# 4) Align all index values to 30-minute intervals 
a <- align.time(m, 60*30) 
# 5) Sum the values for Volume in each period 
half_hour <- period.apply(a, endpoints(a, "minutes", 30), sum) 
+0

谢谢!我正在此错误消息在步骤4:>一< - align.time(米,60 * 30) 错误UseMethod(“align.time”): 关于“align.time”不适用方法应用于对象类“动物园” – user2263130 2013-04-10 21:50:03

+0

@ user2263130的:那是因为你转换'M'到动物园对象,而不是把它当作一个XTS的对象。 'align.time'仅适用于XTS对象,因为动物园的对象,不能保证被时间索引(它们可以被任何真实下令进行索引)。 – 2013-04-10 22:02:54

+0

啊!非常感谢你! – user2263130 2013-04-10 22:03:32

0

我有点困惑步骤3)上面提到的,所以我我所做的是:

library("lubridate") 
library("xts") 
my_data <- read.csv("my_data.csv", stringsAsFactors=FALSE, sep=",", 
header=T) 
colnames(my_data) <- c("Time", "PAR", "NDVI", "LWS") 
#It is easier if you subset your data 
my_data_short_short <- subset(my_data, select = c("Time", "NDVI")) 
my_data_short$Time <- ymd_hm(my_data_short$Time, tz="UTC") 
beginning <- as.POSIXct("2016-05-12 00:00",format = "%Y-%m-%d %H:%M", 
tz="UTC") 
end <- as.POSIXct("2016-06-05 00:00",format = "%Y-%m-%d %H:%M", tz="UTC") 
timesteps <- seq(beginning, end, by="5 min") 
volume <- rep_len(1, length.out=length(timesteps)) 
time_series <- data.frame(timesteps, volum) 
merge <- merge(time_series, my_data_short, by.x= "timesteps", by.y="Time", 
all.x=TRUE, all.y = FALSE) 

#This formats your data to run the package xts 
my_data_brief.xts <- xts(x= merge$NDVI, order.by=merge$timesteps, frequency 
= 1, tzone="UTC") 

# Align all index values to 30-minute intervals 
a <- align.time(my_data_brief.xts, 60*30) 
# 5) Sum the values for Volume in each period 
result <- period.apply(a, endpoints(a, "minutes", 30), sum, na.rm=TRUE) 

saveRDS (result, file="result.rds")