2016-09-27 93 views
1

我处理一个数据帧,其指数为字符串,年月,例如:熊猫DatetimeIndex奇怪的行为

index = ['2007-01', '2007-03', ...] 

然而,该指数不饱满。例如缺少2007-02。 我想要的是用全索引重新索引DataFrame。

我曾尝试:

In [60]: pd.DatetimeIndex(start='2007-01', end='2007-12', freq='M') 
Out[60]: 
DatetimeIndex(['2007-01-31', '2007-02-28', '2007-03-31', '2007-04-30', 
      '2007-05-31', '2007-06-30', '2007-07-31', '2007-08-31', 
      '2007-09-30', '2007-10-31', '2007-11-30'], 
      dtype='datetime64[ns]', freq='M') 

该指数是每月的结束。

In [64]: pd.DatetimeIndex(['2007-01', '2007-03', '2007-04', '2007-05']) 
Out[64]: DatetimeIndex(['2007-01-01', '2007-03-01', '2007-04-01', '2007-05-01'], dtype='datetime64[ns]', freq=None) 

该指数是每个月的开始。

如何处理这个问题?

+0

的'“M''频率为开头月末使用'” MS''为一个月的开始看到[文档]( http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) – EdChum

回答

2

我想你需要添加参数freq='MS'如果个月的需求频率第一天:

print (pd.DatetimeIndex(start='2007-01', end='2007-12', freq='MS')) 
DatetimeIndex(['2007-01-01', '2007-02-01', '2007-03-01', '2007-04-01', 
       '2007-05-01', '2007-06-01', '2007-07-01', '2007-08-01', 
       '2007-09-01', '2007-10-01', '2007-11-01', '2007-12-01'], 
       dtype='datetime64[ns]', freq='MS') 

链接Offset Aliases in pandas documentation,谢谢EdChum

另一种解决方案是使用PeriodIndex用于产生个月期间:

print (pd.PeriodIndex(start='2007-01', end='2007-12', freq='M')) 
PeriodIndex(['2007-01', '2007-02', '2007-03', '2007-04', '2007-05', '2007-06', 
      '2007-07', '2007-08', '2007-09', '2007-10', '2007-11', '2007-12'], 
      dtype='int64', freq='M')