OK,我会在你的索引使用reindex
您time_series,然后groupby
然后应用isnull
,并呼吁sum
:
In [113]:
# load your data, you can ignore this step
t="""time,ActivePowerkW,WindSpeedms,WindSpeedmsstd
2015-05-26 11:40:00,836.6328,8.234862,1.414558
2015-05-26 11:50:00,968.5992,8.761620,1.572579
2015-05-26 12:30:00,614.0503,7.267871,1.575504
2015-05-26 13:50:00,945.5604,8.709115,1.527079
2015-05-26 14:00:00,926.6531,8.538967,1.589221
2015-05-26 14:30:00,666.7984,7.590645,1.324495
2015-05-26 14:40:00,911.0134,8.466603,1.708189
2015-05-26 15:10:00,1256.1740,9.868224,1.636775
2015-05-26 15:30:00,1706.7070,11.225540,1.576277"""
df = pd.read_csv(io.StringIO(t), parse_dates=[0], index_col=[0])
df
Out[113]:
ActivePowerkW WindSpeedms WindSpeedmsstd
time
2015-05-26 11:40:00 836.6328 8.234862 1.414558
2015-05-26 11:50:00 968.5992 8.761620 1.572579
2015-05-26 12:30:00 614.0503 7.267871 1.575504
2015-05-26 13:50:00 945.5604 8.709115 1.527079
2015-05-26 14:00:00 926.6531 8.538967 1.589221
2015-05-26 14:30:00 666.7984 7.590645 1.324495
2015-05-26 14:40:00 911.0134 8.466603 1.708189
2015-05-26 15:10:00 1256.1740 9.868224 1.636775
2015-05-26 15:30:00 1706.7070 11.225540 1.576277
In [115]:
# create your timeseries
timeseries_comp = pd.date_range(df.index[0], df.index[len(df)-1], freq='10min')
timeseries_comp
Out[115]:
DatetimeIndex(['2015-05-26 11:40:00', '2015-05-26 11:50:00',
'2015-05-26 12:00:00', '2015-05-26 12:10:00',
'2015-05-26 12:20:00', '2015-05-26 12:30:00',
'2015-05-26 12:40:00', '2015-05-26 12:50:00',
'2015-05-26 13:00:00', '2015-05-26 13:10:00',
'2015-05-26 13:20:00', '2015-05-26 13:30:00',
'2015-05-26 13:40:00', '2015-05-26 13:50:00',
'2015-05-26 14:00:00', '2015-05-26 14:10:00',
'2015-05-26 14:20:00', '2015-05-26 14:30:00',
'2015-05-26 14:40:00', '2015-05-26 14:50:00',
'2015-05-26 15:00:00', '2015-05-26 15:10:00',
'2015-05-26 15:20:00', '2015-05-26 15:30:00'],
dtype='datetime64[ns]', freq='10T', tz=None)
In [120]:
# reindex
new_df = df.reindex(timeseries_comp)
# group on hour and minute, you can group on some other resolution
new_df.groupby([new_df.index.hour, new_df.index.minute]).apply(pd.Series.isnull).sum()
Out[120]:
ActivePowerkW 15
WindSpeedms 15
WindSpeedmsstd 15
dtype: int64
你试过重建索引?所以'df.reindex(timeseries_comp)' – EdChum
谢谢。 reindex正是我想要的。现在我需要按月计算。我已经尝试过'Avail_Count = df.resample('M',how = {df.count():'count'})'并且似乎可行,但我不关心结果。 – ardms
你应该可以做'df.reindex(timeseries_comp).groupby([df.index.year,df.index.month]).value_counts(drop_na = False)'这应该会给你所有的唯一计数,包括'NaN' ,或者'df.reindex(timeseries_comp).groupby([df.index.year,df.index.month])。apply(pd.Series.isnull).sum()' – EdChum