如何准备一些值与组非常不同的数据？

某些值与组有很大不同，因为缺少行并且数据不连续，所以我的diffVal异常。如何准备一些值与组非常不同的数据？

> df 
        Date diffVal1 diffVal2 
1 2017-05-31 04:01:00  718  483 
2 2017-05-31 05:01:00  704  477 
3 2017-05-31 06:01:00  741  478 
4 2017-05-31 07:01:00  874  483 
5 2017-05-31 08:01:00  907  495 
6 2017-05-31 09:01:00  887  510 
7 2017-05-31 10:01:00  2922  514 
8 2017-05-31 13:01:00  1012  529 
9 2017-05-31 14:01:00  979  539 
10 2017-05-31 15:01:00  886  485 
11 2017-05-31 16:01:00  818  471

你可以看到，有在日期（小时; 11,12）丢失的行，我需要平滑异常值正常。

我想将异常值设置为NULL，但问题是如何知道从大数据框中的数据帧中存在不寻常的值，如果它是我的示例数据框我可以设置数据超过1200 NA（这根本不是个好主意，因为它不合理。）然后使用近似的NA值函数na.approx()，我必须稍后才能得出这些图。

df$diffVal1 <- ifelse((df$diffVal1>1300), NA,df$diffVal1) 
df$diffVal1 <- na.approx(df$diffVal1) 
> df 
        Date diffVal1 diffVal2 
1 2017-05-31 04:01:00 718.0  483 
2 2017-05-31 05:01:00 704.0  477 
3 2017-05-31 06:01:00 741.0  478 
4 2017-05-31 07:01:00 874.0  483 
5 2017-05-31 08:01:00 907.0  495 
6 2017-05-31 09:01:00 887.0  510 
7 2017-05-31 10:01:00 949.5  514 
8 2017-05-31 13:01:00 1012.0  529 
9 2017-05-31 14:01:00 979.0  539 
10 2017-05-31 15:01:00 886.0  485 
11 2017-05-31 16:01:00 818.0  471

该怎么解决这个问题？以及如何通过日期添加缺少的行来近似再次添加？

非常感谢您的帮助。

来源

2017-06-09 Sirawit Takeo

是10:01:00累加值在11:01:00和12:01:00的值？ –

我会按时间降序对数据集进行排序，然后计算累计值，然后在丢失时间查找累计值，然后从累计数据返回原始数据，您将拥有974 10 11和12 –

@Moody_Mudskipper yes yes是因为它是一行一行，我已经将它们的difference_val设置为新列，并且删除了val列（但是来自示例数据中的diffVal2缺少以通过合并功能与diffVal1的缺少Date列相匹配）。我想知道添加这些丢失时间给df的函数，如果我尝试使用大数据框不仅仅是我的例子。感谢您的帮助。 –

告诉我，如果你是这样的：

数据准备：

df <- read.table(text="Date; diffVal1; diffVal2 
1; 2017-05-31 04:01:00;  718;  483 
2; 2017-05-31 05:01:00;  704;  477 
3; 2017-05-31 06:01:00;  741;  478 
4; 2017-05-31 07:01:00;  874;  483 
5; 2017-05-31 08:01:00;  907;  495 
6; 2017-05-31 09:01:00;  887;  510 
7; 2017-05-31 10:01:00;  2922;  514 
8; 2017-05-31 13:01:00;  1012;  529 
9; 2017-05-31 14:01:00;  979;  539 
10; 2017-05-31 15:01:00;  886;  485 
11; 2017-05-31 16:01:00;  818;  471",sep=";",header=TRUE,stringsAsFactors=FALSE) 

df$Date  <- as.POSIXct(df$Date) 
df$diffVal1 <- as.numeric(df$diffVal1) 
df$diffVal2 <- as.numeric(df$diffVal2) 
all_dates <- data.frame(Date = seq(min(df$Date),max(df$Date),by=3600))

工作和结果：

df2 <- df 
df2 <- df2[order(df2$Date,decreasing=TRUE),] 
df2$Val1_total <- cumsum(df2$diffVal1) 
df2 <- merge(df2,all_dates,all.y = TRUE) 

df2$Val1_total[is.na(df2$Val1_total)] <- approx(x = df2$Date, y = df2$Val1_total, xout = df2$Date[is.na(df2$Val1_total)])$y 
df2$diffVal1 <- c(-diff(df2$Val1_total),tail(df2$diffVal1,1)) 

# > df2 
# Date diffVal1 diffVal2 Val1_total 
# 1 2017-05-31 04:01:00  718  483  11448 
# 2 2017-05-31 05:01:00  704  477  10730 
# 3 2017-05-31 06:01:00  741  478  10026 
# 4 2017-05-31 07:01:00  874  483  9285 
# 5 2017-05-31 08:01:00  907  495  8411 
# 6 2017-05-31 09:01:00  887  510  7504 
# 7 2017-05-31 10:01:00  974  514  6617 
# 8 2017-05-31 11:01:00  974  NA  5643 
# 9 2017-05-31 12:01:00  974  NA  4669 
# 10 2017-05-31 13:01:00  1012  529  3695 
# 11 2017-05-31 14:01:00  979  539  2683 
# 12 2017-05-31 15:01:00  886  485  1704 
# 13 2017-05-31 16:01:00  818  471  818

来源

2017-06-09 16:32:54

对不起，休息日。它适用于我，但我会尝试更多的其他方式来找到如何装箱的最佳方式。顺便说一句，如果我不能以其他方式做，我会用你的。非常感谢。 –

狂野的下流者会在意吗？ –

我试着用我的整个数据（更多1k行），它在许多NA分割值上根本不起作用。 –

如何准备一些值与组非常不同的数据？

回答

相关问题