2011-12-29 65 views
2

在分析能源需求和消耗数据时,我正在对问题进行重新抽样并插值时间序列趋势数据。TimeSeries趋势数据的重采样,聚集和插值

数据集例如:

timestamp    value kWh 
------------------  --------- 
12/19/2011 5:43:21 PM 79178 
12/19/2011 5:58:21 PM 79179.88 
12/19/2011 6:13:21 PM 79182.13 
12/19/2011 6:28:21 PM 79183.88 
12/19/2011 6:43:21 PM 79185.63 

基于这些观察,我要一些聚合卷起基于在一段时间内的值,与该频率设定为一个时间单位。

如图所示,在小时的间隔填充丢失的数据

timestamp    value (approx) 
------------------  --------- 
12/19/2011 5:00:00 PM 79173 
12/19/2011 6:00:00 PM 79179 
12/19/2011 7:00:00 PM 79186 

对于线性算法的任何间隙,看来我将采取的差值在时间和乘针对因子的值。

TimeSpan ts = current - previous; 

Double factor = ts.TotalMinutes/period; 

可以基于该因子计算值和时间戳。

有了这样多的可用信息,我不确定为什么很难找到最优雅的方法。

也许首先,有没有可以推荐的开源分析库?

任何针对编程方法的建议?理想情况下,C#,或可能与SQL?

或者,我可以指出任何类似的问题(与答案)?

回答

5

通过使用在内部使用,表示DateTime是否时间刻度的漂移,你得到的最准确的值是可能的。由于这些时间刻度不会在午夜零时重新开始,因此在日间边界处不会出现问题。

// Sample times and full hour 
DateTime lastSampleTimeBeforeFullHour = new DateTime(2011, 12, 19, 17, 58, 21); 
DateTime firstSampleTimeAfterFullHour = new DateTime(2011, 12, 19, 18, 13, 21); 
DateTime fullHour = new DateTime(2011, 12, 19, 18, 00, 00); 

// Times as ticks (most accurate time unit) 
long t0 = lastSampleTimeBeforeFullHour.Ticks; 
long t1 = firstSampleTimeAfterFullHour.Ticks; 
long tf = fullHour.Ticks; 

// Energy samples 
double e0 = 79179.88; // kWh before full hour 
double e1 = 79182.13; // kWh after full hour 
double ef; // interpolated energy at full hour 

ef = e0 + (tf - t0) * (e1 - e0)/(t1 - t0); // ==> 79180.1275 kWh 


在几何的说明,类似三角形是具有相同的形状但不同的尺寸的三角形。上面的公式基于这样一个事实:一个三角形中任何两条边的比例对于相似三角形的相应边是相同的。

如果您有一个三角形A B C和一个相似的三角形a b c,那么A : B = a : b。两个比率的平等称为比例。

我们可以将这个比例规则,我们的问题:

(e1 – e0)/(t1 – t0) = (ef – e0)/(tf – t0) 
--- large triangle -- --- small triangle -- 

enter image description here

+0

惊人 - 这是一个很好的基础 - 谢谢! – 2011-12-29 21:47:10

0

莫比这样的事情:

SELECT DATE_FORMAT('%Y-%m-%d %H', timestamp) as day_hour, AVG(value) as aprox FROM table GROUP BY day_hour 

你用什么数据库引擎?

+0

MS SQL Server 2008 Express。这接近我的需求;尽管如此,我更喜欢C#实现。 – 2011-12-29 20:44:07

0

对于你正在做的事情来说,你似乎在为starters ts =(TimeSpan)(current-previous)声明了TimeSpan不正确。还要确保当前和以前是DateTime类型。

,如果你想看看在计算或在这里卷起我想看看TotalHours()是一个例子,你可以看一个想法,如果你喜欢 这里检查是否有LastWrite /修改时间是24以内小时

if (((TimeSpan)(DateTime.Now - fiUpdateFileFile.LastWriteTime)).TotalHours < 24){} 

我知道,这是不同的,你的情况下,但你如何使用TotalHours

2

我已经写了LINQ功能插值和规范的时间序列数据,以便它可以汇总/合并。

重采样功能如下。我在代码项目上写了一个关于这种技术的short article

// The function is an extension method, so it must be defined in a static class. 
public static class ResampleExt 
{ 
    // Resample an input time series and create a new time series between two 
    // particular dates sampled at a specified time interval. 
    public static IEnumerable<OutputDataT> Resample<InputValueT, OutputDataT>(

     // Input time series to be resampled. 
     this IEnumerable<InputValueT> source, 

     // Start date of the new time series. 
     DateTime startDate, 

     // Date at which the new time series will have ended. 
     DateTime endDate, 

     // The time interval between samples. 
     TimeSpan resampleInterval, 

     // Function that selects a date/time value from an input data point. 
     Func<InputValueT, DateTime> dateSelector, 

     // Interpolation function that produces a new interpolated data point 
     // at a particular time between two input data points. 
     Func<DateTime, InputValueT, InputValueT, double, OutputDataT> interpolator 
    ) 
    { 
     // ... argument checking omitted ... 

     // 
     // Manually enumerate the input time series... 
     // This is manual because the first data point must be treated specially. 
     // 
     var e = source.GetEnumerator(); 
     if (e.MoveNext()) 
     { 
      // Initialize working date to the start date, this variable will be used to 
      // walk forward in time towards the end date. 
      var workingDate = startDate; 

      // Extract the first data point from the input time series. 
      var firstDataPoint = e.Current; 

      // Extract the first data point's date using the date selector. 
      var firstDate = dateSelector(firstDataPoint); 

      // Loop forward in time until we reach either the date of the first 
      // data point or the end date, which ever comes first. 
      while (workingDate < endDate && workingDate <= firstDate) 
      { 
       // Until we reach the date of the first data point, 
       // use the interpolation function to generate an output 
       // data point from the first data point. 
       yield return interpolator(workingDate, firstDataPoint, firstDataPoint, 0); 

       // Walk forward in time by the specified time period. 
       workingDate += resampleInterval; 
      } 

      // 
      // Setup current data point... we will now loop over input data points and 
      // interpolate between the current and next data points. 
      // 
      var curDataPoint = firstDataPoint; 
      var curDate = firstDate; 

      // 
      // After we have reached the first data point, loop over remaining input data points until 
      // either the input data points have been exhausted or we have reached the end date. 
      // 
      while (workingDate < endDate && e.MoveNext()) 
      { 
       // Extract the next data point from the input time series. 
       var nextDataPoint = e.Current; 

       // Extract the next data point's date using the data selector. 
       var nextDate = dateSelector(nextDataPoint); 

       // Calculate the time span between the dates of the current and next data points. 
       var timeSpan = nextDate - firstDate; 

       // Loop forward in time until wwe have moved beyond the date of the next data point. 
       while (workingDate <= endDate && workingDate < nextDate) 
       { 
        // The time span from the current date to the working date. 
        var curTimeSpan = workingDate - curDate; 

        // The time between the dates as a percentage (a 0-1 value). 
        var timePct = curTimeSpan.TotalSeconds/timeSpan.TotalSeconds; 

        // Interpolate an output data point at the particular time between 
        // the current and next data points. 
        yield return interpolator(workingDate, curDataPoint, nextDataPoint, timePct); 

        // Walk forward in time by the specified time period. 
        workingDate += resampleInterval; 
       } 

       // Swap the next data point into the current data point so we can move on and continue 
       // the interpolation with each subsqeuent data point assuming the role of 
       // 'next data point' in the next iteration of this loop. 
       curDataPoint = nextDataPoint; 
       curDate = nextDate; 
      } 

      // Finally loop forward in time until we reach the end date. 
      while (workingDate < endDate) 
      { 
       // Interpolate an output data point generated from the last data point. 
       yield return interpolator(workingDate, curDataPoint, curDataPoint, 1); 

       // Walk forward in time by the specified time period. 
       workingDate += resampleInterval; 
      } 
     } 
    } 
}