2016-12-28 84 views
1

我有一个数据集,其中缺少一些时间戳。如下到目前为止,我已经写代码,用NAN填充丢失的时间戳数据行 - MATLAB

x = table2dataset(Testing_data); 
T1 = x(:,1);    
C1 =dataset2cell(T1); 
formatIn = 'yyyy-mm-dd HH:MM:SS'; 
t1= datenum(C1,formatIn); 

% Creating 10 minutes of time interval; 
avg = 10/60/24;   
tnew = [t1(1):avg:t1(end)]'; 
indx = round((t1-t1(1))/avg) + 1; 
ynew = NaN(length(tnew),1); 
ynew(indx)=t1; 

% replacing missing time with NaN 
t = datetime(ynew,'ConvertFrom','datenum');     
formatIn = 'yyyy-mm-dd HH:MM:SS'; 
DateVector = datevec(ynew,formatIn); 
dt = datestr(ynew,'yyyy-mm-dd HH:MM:SS'); 
ds = string(dt); 

测试数据已经在这里显示三个参数,

 Time      x   y 
2009-04-10 02:00:00.000   1   0.1 
2009-04-10 02:10:00.000   2   0.2 
2009-04-10 02:30:00.000   3   0.3 
2009-04-10 02:50:00.000   4   0.4 

现在你可以看到,对于10分钟的间隔,有遗漏的时间戳(2 :20和2:40),所以我想添加时间戳。然后我想xy值为NAN。所以我的输出会是这样,

 Time      x   y 
2009-04-10 02:00:00.000   1   0.1 
2009-04-10 02:10:00.000   2   0.2 
2009-04-10 02:20:00.000   NaN  NaN 
2009-04-10 02:30:00.000   3   0.3  
2009-04-10 02:40:00.000   NaN  NaN 
2009-04-10 02:50:00.000   4   0.4 

正如你可以从我的代码中看到的,我只是能够增加NaN带时间戳,但现在想取我所需的相应x和y的值。

请注意我有超过3000个以上格式的数据行,我想对我的所有值执行相同的操作。

回答

0

它似乎是在你的问题上的矛盾;你说你可以插入NaN来代替缺少的时间字符串,但是在你写的时间字符串的预期输出的例子中。

而且你是指缺少时间戳(2:20),但是,如果时间步长为10分钟,在您的示例数据还有另外一个缺少时间戳(2:40)

假设:

  • 你真的想插入缺少时间蜇
  • 要管理所有缺少的时间戳

可以按如下修改代码:

  • 不需要ynew时间
  • tnew时间应在地方的ynew
  • 使用在xy柱插入NaN值,你必须:
    • 提取它们从dataset
    • 创建两个新阵列初始化它们到NaN
    • 通过indx

在插入标识的位置原来xy数据如下郁可找到你的代码的更新版本。

  • xy数据存储在x_datay_data阵列
  • xy数据在脚本的末端存储在x_data_newy_data_new阵列

在,两个表生成:第一个生成时间为string,第二个为cellarray。

代码中的注释应标识修改。

x = table2dataset(Testing_data); 
T1 = x(:,1); 
% Get X data from the table 
x_data=x(:,2) 
% Get Y data from the table 
y_data=x(:,3) 

C1 =dataset2cell(T1); 

formatIn = 'yyyy-mm-dd HH:MM:SS'; 
t1= datenum(C1(2:end),formatIn) 

avg = 10/60/24;  % Creating 10 minutes of time interval; 
tnew = [t1(1):avg:t1(end)]' 
indx = round((t1-t1(1))/avg) + 1 
% 
% Not Needed 
% 
% ynew = NaN(length(tnew),1); 
% ynew(indx)=t1; 
% 
% Create the new X and Y data 
% 
y_data_new = NaN(length(tnew),1) 
y_data_new(indx)=t1 

x_data_new=nan(length(tnew),1) 
x_data_new(indx)=x_data 
y_data_new=nan(length(tnew),1) 
y_data_new(indx)=y_data 

% t = datetime(ynew,'ConvertFrom','datenum') % replacing missing time with NAN 
% 
% Use tnew instead of ynew 
% 
t = datetime(tnew,'ConvertFrom','datenum') % replacing missing time with NAN 
formatIn = 'yyyy-mm-dd HH:MM:SS' 
% DateVector = datevec(y_data_new,formatIn) 
% dt = datestr(ynew,'yyyy-mm-dd HH:MM:SS') 
% 
% Use tnew instead of ynew 
% 
dt = datestr(tnew,'yyyy-mm-dd HH:MM:SS') 
% ds = char(dt) 

new_table=table(dt,x_data_new,y_data_new) 
new_table_1=table(cellstr(dt),x_data_new,y_data_new) 

输出是

new_table = 

     dt   x_data_new y_data_new 
    ___________ __________ __________ 

    [1x19 char]  1   0.1  
    [1x19 char]  2   0.2  
    [1x19 char] NaN   NaN  
    [1x19 char]  3   0.3  
    [1x19 char] NaN   NaN  
    [1x19 char]  4   0.4  


new_table_1 = 

      Var1    x_data_new y_data_new 
    _____________________ __________ __________ 

    '2009-04-10 02:00:00'  1   0.1  
    '2009-04-10 02:10:00'  2   0.2  
    '2009-04-10 02:20:00' NaN   NaN  
    '2009-04-10 02:30:00'  3   0.3  
    '2009-04-10 02:40:00' NaN   NaN  
    '2009-04-10 02:50:00'  4   0.4 

希望这有助于。

Qapla”

+0

谢谢你。我只是举了一个例子。它正在工作,但正如我前面所说,我有数据集由6个参数组成(例如x,y,z,a,b,c)。有没有更简单的方法(对x和y所做的)对所有这6个参数都是一样的。意味着每当缺少时间戳时,将NAN加入其时间以及其对应的x,y,z,a,b,c ...? –

+0

我不明白这个问题:你是否想将'NaN'设置为只有附加参数('z,a,b,c')或时间戳?说:'NaN-NaN-NaN ... NaN NaN NaN'或'2009-04-10 02:40:00 NaN NaN NaN NaN NaN NaN'? –

+0

仅适用于参数。你刚刚写的代码我想要的参数不是时间戳相同... –

0

这个例子是不是从接受的答案也不同,但对眼睛恕我直言更容易一点。但是,它支持的差距大于1步,而且更通用一些,因为它的假设更少。

它的工作原理与普通电池阵列代替了原来的表中的数据,这样的转换是由你(我在R2010a版本所以无法测试)

% Example data with intentional gaps of varying size 
old_data = {'2009-04-10 02:00:00.000' 1 0.1 
      '2009-04-10 02:10:00.000' 2 0.2 
      '2009-04-10 02:30:00.000' 3 0.3 
      '2009-04-10 02:50:00.000' 4 0.4 
      '2009-04-10 03:10:00.000' 5 0.5 
      '2009-04-10 03:20:00.000' 6 0.6 
      '2009-04-10 03:50:00.000' 7 0.7} 


% Convert textual dates to numbers we can work with more easily 
old_dates = datenum(old_data(:,1)); 

% Nominal step size is the minimum of all differences 
deltas = diff(old_dates); 
nominal_step = min(deltas); 

% Generate new date numbers with constant step 
new_dates = old_dates(1) : nominal_step : old_dates(end); 

% Determine where the gaps in the data are, and how big they are, 
% taking into account rounding error 
step_gaps = abs(deltas - nominal_step) > 10*eps; 
gap_sizes = round(deltas(step_gaps)/nominal_step - 1); 

% Create new data structure with constant-step time stamps, 
% initially with the data of interest all-NAN 
new_size = size(old_data,1) + sum(gap_sizes); 
new_data = [cellstr(datestr(new_dates, 'yyyy-mm-dd HH:MM:SS')),... 
      repmat({NaN}, new_size, 2)]; 

% Compute proper locations of the old data in the new data structure, 
% again, taking into account rounding error 
day = 86400; % (seconds in a day) 
new_datapoint = ismember(round(new_dates * day), ... 
         round(old_dates * day)); 

% Insert the old data at the right locations 
new_data(new_datapoint, 2:3) = data(:, 2:3) 

输出是:

old_data = 
    '2009-04-10 02:00:00.000' [1] [0.100000000000000] 
    '2009-04-10 02:10:00.000' [2] [0.200000000000000] 
    '2009-04-10 02:30:00.000' [3] [0.300000000000000] 
    '2009-04-10 02:50:00.000' [4] [0.400000000000000] 
    '2009-04-10 03:10:00.000' [5] [0.500000000000000] 
    '2009-04-10 03:20:00.000' [6] [0.600000000000000] 
    '2009-04-10 03:50:00.000' [7] [0.700000000000000] 

new_data = 
    '2009-04-10 02:00:00' [ 1] [0.100000000000000] 
    '2009-04-10 02:10:00' [ 2] [0.200000000000000] 
    '2009-04-10 02:20:00' [NaN] [    NaN] 
    '2009-04-10 02:30:00' [ 3] [0.300000000000000] 
    '2009-04-10 02:40:00' [NaN] [    NaN] 
    '2009-04-10 02:50:00' [ 4] [0.400000000000000] 
    '2009-04-10 03:00:00' [NaN] [    NaN] 
    '2009-04-10 03:10:00' [ 5] [0.500000000000000] 
    '2009-04-10 03:20:00' [ 6] [0.600000000000000] 
    '2009-04-10 03:30:00' [NaN] [    NaN] 
    '2009-04-10 03:40:00' [NaN] [    NaN] 
    '2009-04-10 03:50:00' [ 7] [0.700000000000000]