2017-01-09 197 views
-3

基本上,我需要的是现有表的小时平均值(见下文)。某些小时的平均值

输入文件最初是在15分钟的粒度。输出结果应显示每天的平均值,以及平均每个上午8点到下午8点在单独列中的时间。

DateTime   Value    Date Output Average (entire day) Average (8am - 8pm) 
09/01/2017 00:00 5055.414058 -> 09/01/2017  
09/01/2017 00:15 5055.414058  10/01/2017  
09/01/2017 00:30 5055.414058  11/01/2017  
09/01/2017 00:45 5055.414058  12/01/2017  
09/01/2017 01:00 5986.204028  13/01/2017  
09/01/2017 01:15 5986.204028  14/01/2017  
09/01/2017 01:30 5986.204028  15/01/2017  
09/01/2017 01:45 5986.204028  16/01/2017  
09/01/2017 02:00 7199.824865  17/01/2017  
09/01/2017 02:15 7199.824865  18/01/2017  
09/01/2017 02:30 7199.824865  19/01/2017  
09/01/2017 02:45 7199.824865  20/01/2017  
09/01/2017 03:00 9185.008333  21/01/2017  
09/01/2017 03:15 9185.008333  22/01/2017  
09/01/2017 03:30 9185.008333  23/01/2017  
…     
13/01/2017 22:00 94080.58174    
13/01/2017 22:15 94080.58174    
13/01/2017 22:30 94080.58174    
13/01/2017 22:45 94080.58174    
13/01/2017 23:00 93231.23486    
13/01/2017 23:15 93231.23486    
13/01/2017 23:30 93231.23486    
13/01/2017 23:45 93231.23486    
14/01/2017 00:00 91619.33743    
14/01/2017 00:15 91619.33743    
14/01/2017 00:30 91619.33743    
14/01/2017 00:45 91619.33743    
14/01/2017 01:00 89894.48751    
14/01/2017 01:15 89894.48751    
14/01/2017 01:30 89894.48751    
…     
import pandas as pd 
import datetime 
import numpy as np 
import glob 
import csv 

# Local path and name of the excel file. 
path = 'W:/myfolder/' 
sheetname = "Forecast_" + datetime.datetime.today().strftime('%d.%m.%Y-%H') 
filename = path + sheetname + ".csv" 

#Create data frame of data 
df = pd.read_csv(filename ,delimiter=',',engine = 'python', encoding='latin-1', index_col = False) 
print(df) 

table = df.groupby([df["DateTime"].dt.day, df["DateTime"].dt.hour]).mean() 

print(table) 
+0

'df.mean()'怎么办?那是你在找什么? –

+3

你的代码不工作? – IanS

回答

0

简单,你只需要创建一个包含日期(表示为天)第一列,并完成一个GROUPBY +骨料。

我会尽快张贴代码:)

编辑:如许的代码,

from datetime import datetime 
import pandas as pd 
import os 

folderPath = "data/" 

#Put all dataframes together 
def folderIterator(folderPath): 
    for item in os.listdir(folderPath): 
     yield pd.read_csv("{:s}{:s}".format(folderPath, item)) 

dfIterator = folderIterator(folderPath) 
fullDataFrame = dfIterator.next() 
for df in dfIterator: 
    fullDataFrame = fullDataFrame.append(df) 

#Create date column 
fullDataFrame["DayCol"] = fullDataFrame["DateTime"].map(lambda x: datetime.strptime(x, '%d/%m/%Y %H:%M')).map(lambda x: x.strftime("%d/%m/%Y")) 
finalDF = fullDataFrame.groupby("DayCol").mean() 

print finalDF 

询问代码:)

0

使用pd.TimeGrouperquery任何问题,pd.concat

tidx = pd.date_range('2016-03-30', '2016-04-01', freq='2H') 

df = pd.DataFrame(dict(value=np.random.rand(len(tidx))), tidx) 

from8to8 = df.assign(hour=df.index.hour).query('8 >= hour < 9') \ 
    .groupby(pd.TimeGrouper('D')).value.mean().rename('8to8') 
daily = df.groupby(pd.TimeGrouper('D')).value.mean().rename('day') 
pd.concat([df.value, daily, from8to8], axis=1).ffill() 



         value  day  8to8 
2016-03-30 00:00:00 0.671287 0.565916 0.704173 
2016-03-30 02:00:00 0.997307 0.565916 0.704173 
2016-03-30 04:00:00 0.335283 0.565916 0.704173 
2016-03-30 06:00:00 0.722650 0.565916 0.704173 
2016-03-30 08:00:00 0.794335 0.565916 0.704173 
2016-03-30 10:00:00 0.992366 0.565916 0.704173 
2016-03-30 12:00:00 0.206157 0.565916 0.704173 
2016-03-30 14:00:00 0.480467 0.565916 0.704173 
2016-03-30 16:00:00 0.389169 0.565916 0.704173 
2016-03-30 18:00:00 0.326746 0.565916 0.704173 
2016-03-30 20:00:00 0.458807 0.565916 0.704173 
2016-03-30 22:00:00 0.416415 0.565916 0.704173 
2016-03-31 00:00:00 0.344517 0.487147 0.409475 
2016-03-31 02:00:00 0.095404 0.487147 0.409475 
2016-03-31 04:00:00 0.412321 0.487147 0.409475 
2016-03-31 06:00:00 0.384827 0.487147 0.409475 
2016-03-31 08:00:00 0.810305 0.487147 0.409475 
2016-03-31 10:00:00 0.052873 0.487147 0.409475 
2016-03-31 12:00:00 0.936284 0.487147 0.409475 
2016-03-31 14:00:00 0.303524 0.487147 0.409475 
2016-03-31 16:00:00 0.347630 0.487147 0.409475 
2016-03-31 18:00:00 0.787372 0.487147 0.409475 
2016-03-31 20:00:00 0.989716 0.487147 0.409475 
2016-03-31 22:00:00 0.380994 0.487147 0.409475 
2016-04-01 00:00:00 0.091029 0.091029 0.091029 
+0

对不起,迟到了! 首先感谢您的回答。我刚刚尝试过您的解决方案,并收到此错误消息: peak = df.assign(hour = df.index.hour).query('8> = hour <9')。groupby(pd.TimeGrouper('D '))。value.mean()。rename('Peak') AttributeError:'RangeIndex'对象没有'hour'属性 –

+0

@O_Vizzle df = pd.DataFrame(dict(value = np.random.rand(len tidx))),tidx) – piRSquared

+0

你把tidx放在最后? – piRSquared