2016-12-02 195 views
0

我有一套34年的网格化海表温度的日常值(每天12418个文件x 4248点),并假装计算每周值。在这篇文章https://stackoverflow.com/a/15102394/709777之后,我几乎成功了。但是日期和星期之间有一些分歧。我无法找到这一点,我想确定我得到了计算每周平均值的正确日期。R每周平均值

我用这块我的 - [R脚本的阅读日常数据并构建(由4248的列/温度12418行/天)包含从在列的单点的所有每日值的大数据帧

# Paths 
ruta_datos_diarios<-"/home/meteo/PROJECTES/VERSUS/DATA/SST/CSV/" 
ruta_files<-"/home/meteo/PROJECTES/VERSUS/SCRIPTS/CLUSTER/FILES/" 
ruta_eixida<-"/home/meteo/PROJECTES/VERSUS/OUTPUT/DATA/SEMANAL/" 

# List of daily files 
files <- list.files(path = ruta_datos_diarios, pattern = "SST-diaria-MED") 

output <- matrix(ncol=4248, nrow=length(files)) 
fechas <- matrix(ncol=1, nrow=length(files)) 

for (i in 1:length(files)){ 
    # read data 
    datos<-read.csv(paste0(ruta_datos_diarios,files[i],sep=""),header=TRUE,na.strings = "NA") 
    datos<-datos[complete.cases(datos),] 

    # Extract dates from daily file names 
    yyyy<-substr(files[i],16,19) 
    mm<-substr(files[i],20,21) 
    dd<-substr(files[i],22,23) 
    dates[i,]<-paste0(yyyy,"-",mm,"-",dd,sep="") 

    output[i,]<-t(datos$sst) 
} 

datos.df<-as.data.frame(output) 

# Build a dataframe with the dates (day, week and year) 
fechas<-as.data.frame(fechas) 
fechas$V1<-as.Date(fechas$V1) 
fechas$Week <- week(fechas$V1) 
fechas$Year <- year(fechas$V1) 

# Extract day of the week (Saturday = 6) 
fechas$Week_Day <- as.numeric(format(fechas$V1, format='%w')) 
# Adjust end-of-week date (first saturday from the original Date) 
fechas$End_of_Week <- fechas$V1 + (6 - fechas$Week_Day) 

# new dataframe from End_of_Week 
fechas.semana<-fechas[!duplicated(fechas$End_of_Week),] 
fechas.semana<-as.data.frame(fechas.semana) 

colnames(fechas)<-c("Day","Week","Year","Week_Day","End_of_Week") 
colnames(fechas.semana)<-c("Day","Week","Year","Week_Day","End_of_Week") 

这是我读取数据和日期的方式。为了保留一个简短的例子,我已经在这个文件temp-sst.csv(包括“Day”,“Week”,“Year”,“Week_Day”,“End_of_Week”等10个变量)中保存了一部分数据帧。

sst.dat <- read.csv("temp-dat.csv",header=TRUE) 

# Join dates and SST values 
sst.dat <- cbind(fechas, sst.dat) 

# Build new dates data frame 
fechas<-as.data.frame(sst.dat$Day) 
colnames(fechas)<-c("Day") 
fechas$Day<-as.Date(fechas$Day) 
fechas$Week <- week(fechas$Day) 
fechas$Year <- year(fechas$Day) 
# Extract day of the week (Saturday = 6) 
fechas$Week_Day <- as.numeric(format(fechas$Day, format='%w')) 
# Adjust end-of-week date (first saturday from the original Date) 
fechas$End_of_Week <- fechas$Day + (6 - fechas$Week_Day) 

fechas.semana<-fechas[!duplicated(fechas$End_of_Week),] 
fechas.semana<-as.data.frame(fechas.semana) 

colnames(fechas)<-c("Day","Week","Year","Week_Day","End_of_Week") 
colnames(fechas.semana)<-c("Day","Week","Year","Week_Day","End_of_Week") 

# Weekly aggregation function from the referred post 
media.semanal <- function(x, column){ 
    a<-aggregate(x[,column]~End_of_Week+Year, FUN=mean, data=x, na.rm=TRUE) 
    colnames(a)<-c("End_of_Week","Year","SSTmean") 
    return(a) 
} 

# Matrix to be populated by weekly function 
SST.mat<-matrix(nrow=nrow(fechas.semana), ncol=length(sst.dat)-5) # 5 son las columnas de fecha 

for (j in 6:length(sst.dat)){ # comienza en 6 para evitar las columnas de fecha 
b<-media.semanal(sst.dat,j) 
SST.mat[,j-5]<-b$SSTmean 
} 

但是问题来了。循环中的“b”数据框有145行,而SST.mat和fechas.semana只有144行。我还没有找到这种不一致的地方。

任何帮助将不胜感激,我卡在这里。 谢谢

+6

“_To保持短example_” - 而不是发布一个链接到Dropbox的上一个1000 * 10的文件,你应该提供一个_minimal_,自成体系的例子。 – Henrik

+0

你是对的@henrik,有用的标志提出 – pacomet

回答

1

您有一个b$End_of_Week的重复。

首先,我注意到,有在集合成员资格没有任何区别:

setdiff(as.character(b$End_of_Week),as.character(fechas.semana$End_of_Week)) 

字符(0)

然后我意识到,必须是因为重复的,并证实了它像这样:

table(table(as.character(b$End_of_Week))>1) 
143 1 
FALSE TRUE 

看着桌子上显示的暗号是1983-01-01

看来根本原因是,你通过End_of_Week + Year其中Year是不必要的聚集,因为End_of_Week有当年一样好,如果你只通过汇总你End_of_Week得到144,而不是145

# Weekly aggregation function from the referred post 
media.semanal <- function(x, column){ 
    a<-aggregate(x[,column]~End_of_Week, FUN=mean, data=x, na.rm=TRUE) 
    colnames(a)<-c("End_of_Week","SSTmean") 
    return(a) 
} 

# Matrix to be populated by weekly function 
SST.mat<-matrix(nrow=nrow(fechas.semana), ncol=length(sst.dat)-5) # 5 son las columnas de fecha 

for (j in 6:length(sst.dat)){ # comienza en 6 para evitar las columnas de fecha 
    b<-media.semanal(sst.dat,j) 
    SST.mat[,j-5]<-b$SSTmean 
} 
dim(b)