如何统计多索引数据框中每天的行数？

我有一个两级MultiIndex的DataFrame。第一级date是DatetimeIndex，第二级name只是一些字符串。数据有10分钟的时间间隔。如何统计多索引数据框中每天的行数？

如何按日期对MultiIndex的第一级进行分组并计算每天的行数？

我怀疑耦合到一个多指标的DatetimeIndex是给我的问题，因为这样做

data.groupby(pd.TimeGrouper(freq='D')).count()

给我

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'MultiIndex'

我也试着写

data.groupby(data.index.levels[0].date).count()

这导致

ValueError: Grouper and axis must be same length

例如，我该如何让石斑变得更长（即，包括重复的索引值，现在忽略它们使它比轴短）？

谢谢！

来源

2017-08-03 basse

你能提供你的数据框的问题的样本中删除名字？ –

您可以在Grouper中使用level关键字。（另请注意，TimeGrouper已弃用）。此参数为

目标指数的等级。

实例数据框：

dates = pd.date_range('2017-01', freq='10MIN', periods=1000) 
strs = ['aa'] * 1000 
df = pd.DataFrame(np.random.rand(1000,2), index=pd.MultiIndex.from_arrays((dates, strs)))

解决方案：

print(df.groupby(pd.Grouper(freq='D', level=0)).count()) 
       0 1 
2017-01-01 144 144 
2017-01-02 144 144 
2017-01-03 144 144 
2017-01-04 144 144 
2017-01-05 144 144 
2017-01-06 144 144 
2017-01-07 136 136

更新：你在你的意见，你得到的计数有你想降为零指出。例如，假设您的数据帧实际上是缺少一些天：

df = df.drop(df.index[140:400]) 
print(df.groupby(pd.Grouper(freq='D', level=0)).count()) 
       0 1 
2017-01-01 140 140 
2017-01-02 0 0 
2017-01-03 32 32 
2017-01-04 144 144 
2017-01-05 144 144 
2017-01-06 144 144 
2017-01-07 136 136

据我所知，有没有办法中.count排除零个计数。相反，您可以使用上面的结果来删除零。

第一溶液（可能不太可取，因为它转换和int结果float在引入np.nan，将

res = df.groupby(pd.Grouper(freq='D', level=0)).count() 
res = res.replace(0, np.nan).dropna()

第二和更好的解决方案，在我看来，从here：

res = res[(res.T != 0).any()] 
print(res) # notice - excludes 2017-01-02 
       0 1 
2017-01-01 140 140 
2017-01-03 32 32 
2017-01-04 144 144 
2017-01-05 144 144 
2017-01-06 144 144 
2017-01-07 136 136

.any从NumPy移植到熊猫，并且当任何元素在请求的轴上为真时返回True。

来源

2017-08-03 15:47:59

谢谢，布拉德，你完美地回答了我的问题。作为一个学习机会，我注意到我得到了零计数的行，并将'.dropna（）'附加到'.groupby（）。count（）'语句不会删除这些行。任何使“Grouper”在同一行中直接落入零计数的方法？ – basse

假设数据框看起来像这样

d=pd.DataFrame([['Mon','foo',3],['Tue','bar',6],['Wed','qux',9]], 
       columns=['date','name','amount'])\ 
       .set_index(['date','name'])

可以从指数仅此分组操作

d.reset_index('name', drop=True)\ 
.groupby('date')\ 
['amount'].count()

来源

2017-08-03 16:01:14

如何统计多索引数据框中每天的行数？

回答

相关问题