假设我在pandas.DataFrame中有一个时间戳列datetime。为了举例，时间戳以秒分辨率表示。我想在10分钟[1]水桶/箱内铲斗/装桶。我知道我可以将datetime表示为整数时间戳，然后使用直方图。有一个更简单的方法吗？内置到pandas的东西？使用熊猫的日期时间的每小时直方图

[1] 10分钟只是一个例子。最终，我想使用不同的解决方案。

2016-01-15 Dror

这可能将让你关闭：'df.groupby（pd.TimeGrouper（频率= '10分钟'））的意思是（）图（KIND = “巴”）'你可以用“hist”替换“bar”，但我不确定这是否有很大意义。我猜测y轴应该是频率，但x轴应该是什么？你有一个原始数据的例子和一个例子，说明情节应该是什么样子（即使它只是一个口头描述） – johnchase

要使用像“10Min”这样的自定义频率，您必须使用TimeGrouper - 正如@johnchase所建议的那样 - 在index上运行。

# Generating a sample of 10000 timestamps and selecting 500 to randomize them 
df = pd.DataFrame(np.random.choice(pd.date_range(start=pd.to_datetime('2015-01-14'),periods = 10000, freq='S'), 500), columns=['date']) 
# Setting the date as the index since the TimeGrouper works on Index, the date column is not dropped to be able to count 
df.set_index('date', drop=False, inplace=True) 
# Getting the histogram 
df.groupby(pd.TimeGrouper(freq='10Min')).count().plot(kind='bar')

使用`to_period`

也可以使用to_period方法，但它不工作 - 因为据我所知 - 自定义时间段，如“10分钟”。这个例子需要一个额外的列来模拟一个项目的类别。

# The number of sample 
nb_sample = 500 
# Generating a sample and selecting a subset to randomize them 
df = pd.DataFrame({'date': np.random.choice(pd.date_range(start=pd.to_datetime('2015-01-14'),periods = nb_sample*30, freq='S'), nb_sample), 
        'type': np.random.choice(['foo','bar','xxx'],nb_sample)}) 

# Grouping per hour and type 
df = df.groupby([df['date'].dt.to_period('H'), 'type']).count().unstack() 
# Droping unnecessary column level 
df.columns = df.columns.droplevel() 
df.plot(kind='bar')

来源

2016-01-15 22:23:34 Romain

这让我更加接近。谢谢。我仍然有两个问题：1）x轴刻度与数据的日期时间性质无关，2）不应将“小节之和”设置为500？ – Dror

不应该像@johnchase建议的那样，用'.plot（kind ='bar'）'而不是'.hist（）'？ – Dror

对不起，我在我的第一个答案中犯了一个大错（太快不是解决方案）。我刚编辑它，并认为它现在解决了您的问题。 ''sum''现在是500 :-) – Romain

使用熊猫的日期时间的每小时直方图

回答

使用to_period

相关问题

使用`to_period`