每日观察我有一个Python数据帧像查找最接近于特定的时间间隔不规则的数据

Out[110]: 
Time 
2014-09-19 21:59:14 55.975 
2014-09-19 21:56:08 55.925 
2014-09-19 21:53:05 55.950 
2014-09-19 21:50:29 55.950 
2014-09-19 21:50:03 55.925 
2014-09-19 21:47:00 56.150 
2014-09-19 21:53:57 56.225 
2014-09-19 21:40:51 56.225 
2014-09-19 21:37:50 56.300 
2014-09-19 21:34:46 56.300 
2014-09-19 21:31:41 56.350 
2014-09-19 21:30:08 56.500 
2014-09-19 21:28:39 56.375 
2014-09-19 21:25:34 56.350 
2014-09-19 21:22:32 56.400 
2014-09-19 21:19:27 56.325 
2014-09-19 21:16:25 56.325 
2014-09-19 21:13:21 56.350 
2014-09-19 21:10:18 56.425 
2014-09-19 21:07:13 56.475 
Name: Spread, dtype: float64

延伸在长时间内（几个月到几年），因此与很多观察每一天。我想要做的是我每天想要检索最接近特定时间的时间序列观察值，比如16:00。

我的做法到目前为止一直

eodsearch = pd.DataFrame(df['Date'] + datetime.timedelta(hours=16)) 

eod = df.iloc[df.index.get_loc(eodsearch['Date'] ,method='nearest')]

目前给我的

"Cannot convert input [Time Date, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp

另外一个错误，我看到get_loc也接受公差作为输入，所以如果我可以设置公差说30分钟，这将是伟大的。

关于为什么我的代码失败或如何解决它的任何建议？

来源

2017-02-13 thevaluebay

请勿将数据发布为图像。我手动输入数据并替换图像并将代码格式化为代码。请参阅[Markdown帮助]（http://stackoverflow.com/editing-help）了解如何在您的问题和答案中设置代码格式。 –

准备数据：

from pandas.tseries.offsets import Hour 

df.sort_index(inplace=True) # Sort indices of original DF if not in sorted order 
# Create a lookup dataframe whose index is offsetted by 16 hours 
d = pd.DataFrame(dict(Time=pd.unique(df.index.date) + Hour(16)))

（ⅰ）：使用reindex支持观测两种方式查找：（双向兼容）

# Find values in original within +/- 30 minute interval of lookup 
df.reindex(d['Time'], method='nearest', tolerance=pd.Timedelta('30Min'))

（ⅱ）：（向后兼容）

# Find values in original within 30 minute interval of lookup (backwards) 
pd.merge_asof(d, df.reset_index(), on='Time', tolerance=pd.Timedelta('30Min'))

（ⅲ）：为了获得日期范围使用merge_asof在原始DF识别独特日期之后来自+/-通过查询和重新索引获得30分钟带宽间隔：

Index.get_loc对输入的单个标签进行操作，因此整个系列对象不能直接传递给它。

相反，DatetimeIndex.indexer_between_time这给骗内的索引指定start_time & end_time那天明智的会更适合用于此目的的所有行。（两个端点都包括在内）

# Tolerance of +/- 30 minutes from 16:00:00 
df.iloc[df.index.indexer_between_time("15:30:00", "16:30:00")]

数据用于在结果得出：

idx = pd.date_range('1/1/2017', periods=200, freq='20T', name='Time') 
np.random.seed(42) 
df = pd.DataFrame(dict(observation=np.random.uniform(50,60,200)), idx) 
# Shuffle indices 
df = df.sample(frac=1., random_state=42)

信息：

df.info() 
<class 'pandas.core.frame.DataFrame'> 
DatetimeIndex: 200 entries, 2017-01-02 07:40:00 to 2017-01-02 10:00:00 
Data columns (total 1 columns): 
observation 200 non-null float64 
dtypes: float64(1) 
memory usage: 3.1 KB

来源

2017-02-13 17:46:29

非常感谢您的帮助！ – thevaluebay

检查输出后，似乎merge_asof只查看指定时间点之前的值，所以不是+/-而是仅仅是 - ？ – thevaluebay

From（http://pandas.pydata.org/pandas-docs/stable/generated/pandas.merge_asof.html#pandas.merge_asof）我发现“对于左边的DataFrame中的每一行，我们选择最后一行“开”键小于或等于左键的正确DataFrame“ – thevaluebay

查找最接近于特定的时间间隔不规则的数据

回答

准备数据：

相关问题