2017-09-24 89 views
1

我知道有类似的问题已经被回答。但是,我似乎无法解决为什么没有解决方案为我工作。 我的样本数据集:在熊猫数据框中平均每两个连续的索引值(每2min)

TimeStamp  340   341   342   
    10:27:00  1.953036  2.110234  1.981548  
    10:28:00  1.973408  2.046361  1.806923   
    10:29:00  0.000000  0.000000  0.014881   
    10:30:00  2.567976  3.169928  3.479591 

我想找到每两分钟每列的数据的平均值。虽然df.groupby承诺一个整洁的解决方案,但它使我的TimeStamp列出于某种原因消失。非常感谢帮助。

预期输出:

TimeStamp  340   341   342  
10:27:30  1.963222  2.078298  1.894235    
10:29:30  1.283988  1.584964  1.747236 

尝试代码:

import pandas as pd 
    import numpy as np 

    path = '/Users/username/Desktop/Model/' 
    file1 = 'filename.csv' 

    df = pd.read_csv(path + file1, skipinitialspace = True) 

    df['TimeStamp'] = pd.to_timedelta(df['TimeStamp']) 
    df['TimeStamp'] = df['TimeStamp'].dt.floor('min') 
    df.set_index('TimeStamp') 
    rowF = len(df['TimeStamp']) 

    # Average every two min 
    newdf = df.groupby(np.arange(len(df.index))//2).mean() 
    print(newdf)   

回答

0

设置时间为指标:

df.set_index(pd.to_timedelta(df.TimeStamp), inplace=True) 

然后用resample和每两分钟汇总:

df.resample("2min").mean().reset_index() 

# TimeStamp  340  341  342 
#0 10:27:00 1.963222 2.078298 1.894235 
#1 10:29:00 1.283988 1.584964 1.747236 
#2 10:31:00  NaN  NaN  NaN 

删除最后一个观察iloc

df.resample("2min").mean().reset_index().iloc[:-1] 

# TimeStamp  340  341  342 
#0 10:27:00 1.963222 2.078298 1.894235 
#1 10:29:00 1.283988 1.584964 1.747236 

如果你喜欢的TimeStamp通过30秒转移:

(df.resample("2min").mean().reset_index() 
    .assign(TimeStamp = lambda x: x.TimeStamp + pd.Timedelta('30 seconds')) 
    .iloc[:-1]) 

# TimeStamp  340  341  342 
#0 10:27:30 1.963222 2.078298 1.894235 
#1 10:29:30 1.283988 1.584964 1.747236