2017-03-08 46 views
2

我试图绘制了在夜间运行某些程序的持续时间,我的节目时长的数据导出到一个CSV文件,以便以后进行分析。 (像这样)我如何可以绘制一个程序在python

example

这里是我的代码和CSV例子:

CSV:

na,programName,totaal,na,startDate,endDate,Date 
?,"to/check.apl",54006,?,2017-02-27T20:04:07.233,2017-02- 27T20:05:01.239,2017-02-27T00:00:00.000 
?,"to/ibx.apl",143887,?,2017-02-27T20:07:55.627,2017-02-27T20:10:19.514,2017-02-27T00:00:00.000 
?,"to/checker.apl",2039600,?,2017-02-27T20:14:37.662,2017-02-27T20:48:37.262,2017-02-27T00:00:00.000 

Python代码:

import matplotlib 
from pandas import * 
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 

matplotlib.style.use('ggplot') 

data = "miFile.csv" 
df = pd.DataFrame.from_csv(data) 
df = df.set_index('totaal') 

newDf = df[['programName','startDate','endDate']] 

到目前为止,我得到的日期时间错误,所以我试图通过这样做来解决这个问题(也没有好运的情节):

newDf['startDate'] = pd.to_datetime(newDf['startDate']) 
newDf['endDate'] = pd.to_datetime(newDf['endDate']) 

#pd.to_datetime(pd.Series(["2017-02-27T20:04:07.233"]) format= "%d, %m, %y, %H: %M: %S") 

newDf.plot('programName','startDate','endDate') 

plt.show() 

回答

2

我认为你需要read_csv创建df,然后得到列的差异,convert timedeltaminutesplot

temp=u"""na,programName,totaal,na,startDate,endDate,Date 
?,"to/check.apl",54006,?,2017-02-27T20:04:07.233,2017-02-27T20:05:01.239,2017-02-27T00:00:00.000 
?,"to/ibx.apl",143887,?,2017-02-27T20:07:55.627,2017-02-27T20:10:19.514,2017-02-27T00:00:00.000 
?,"to/checker.apl",2039600,?,2017-02-27T20:14:37.662,2017-02-27T20:48:37.262,2017-02-27T00:00:00.000""" 
#after testing replace 'StringIO(temp)' to 'filename.csv' 
df = pd.read_csv(StringIO(temp), index_col=[2], parse_dates=[4,5,6]) 

print (df.dtypes) 
na      object 
programName   object 
na.1     object 
startDate  datetime64[ns] 
endDate  datetime64[ns] 
Date   datetime64[ns] 
dtype: object 
df['duration'] = (df['endDate'] - df['startDate']).astype('timedelta64[m]') 
newDf = df[['programName','duration']] 
print (newDf) 
      programName duration 
totaal       
54006  to/check.apl  0.0 
143887  to/ibx.apl  2.0 
2039600 to/checker.apl  33.0 

newDf.plot() 

plt.show() 
+0

谢谢,这工作得很好,我用'newDf.plot( 'PROGRAMNAME', '时间')'来得到它的权利,我也用'astype( 'timedelta64 [S]')'来获得它在几秒钟内。但是我只看到它应该像70 – H35am

+0

如果测试'打印(DF)'只有7排7程序的名字呢? – jezrael

+0

'print(df)'给了我这个:'[70 rows x 6 columns]' – H35am

0

我建议你使用pandas.read_csv( )而不是pandas.DataFrame.from_csv()。 然后我会考虑将时间与时间分开的T。

0

由于jezreal这是我最后的解决方案是如何看起来和正常工作。我在几秒钟内计划,因为1分钟以下的节目将被忽略,这在我的情况下是不准确的。

import matplotlib 
from pandas import * 
import pandas as pd 
import matplotlib.pyplot as plt 

matplotlib.style.use('ggplot') 

data = "miFile.csv" 
df = pd.read_csv(data,index_col=[2], parse_dates=[4,5,6]) 

df['duration'] = (df['endDate'] - df['startDate']).astype('timedelta64[s]') 
newDf = df[['programName','duration']] 

newDf.plot('programName','duration') 
plt.show() 
相关问题