我在寻找基于在开始时间和ENDCOLUMN值有一个“扩”的日期范围。熊猫累计时间序列范围数据帧
如果在之前的纪录出现创纪录的任何部分,我想回到一个开始时间是两个开始时间记录的最小和结束时间是最大的两个结束时间记录。
这些将通过订单ID
Order starttime endtime RollingStart RollingEnd
1 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150
1 2015-07-01 10:24:43.137 2015-07-01 10:24:43.200 2015-07-01 10:24:43.047 2015-07-01 10:24:43.200
1 2015-07-01 10:24:43.197 2015-07-01 10:24:57.257 2015-07-01 10:24:43.047 2015-07-01 10:24:57.257
1 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470
1 2015-07-01 10:24:57.730 2015-07-01 10:25:13.485 2015-07-01 10:24:57.465 2015-07-01 10:25:13.485
2 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485
所以进行分组,在上述例子中,订单1具有运行从2015年7月1日10的初始范围:24:43.047到2015-07- 01 10:24:57.257然后另一个从2015-07-01 10:24:57.465到2015-07-01 10:25:13.485
请注意,虽然开始时间有序,但结束时间不一定由于数据的性质(有短期的事件和长期事件)
最后,我只想每个订单ID的最后一个记录,滚动发车组合(所以在这种情况下,最后两个记录
我试图
df['RollingStart'] = np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['starttime']), min(df['starttime'],df['RollingStart']),df['starttime'])
(这显然不包括订单id)
但错误我收到是
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
任何想法,将不胜感激
代码复制如下:
from io import StringIO
import io
text = """Order starttime endtime
1 2015-07-01 10:24:43.047 2015-07-01 10:24:43.150
1 2015-07-01 10:24:43.137 2015-07-01 10:24:43.200
1 2015-07-01 10:24:43.197 2015-07-01 10:24:57.257
1 2015-07-01 10:24:57.465 2015-07-01 10:25:13.470
1 2015-07-01 10:24:57.730 2015-07-01 10:25:13.485
2 2015-07-01 10:48:57.465 2015-07-01 10:48:13.485"""
df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[1, 2])
df['RollingStart'] = np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['start']), min(df['starttime'],df['RollingStart']),df['starttime'])
df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[1, 2])
df['RollingStart']=df['starttime']
df['RollingEnd']=df['endtime']
df['RollingStart'] =
np.where((df['endtime'] >= df['RollingStart'].shift()) & (df['RollingEnd'].shift()>= df['starttime']),min(df['starttime'],df['RollingStart']),df['starttime'])
错误是:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "C:\Anaconda3\lib\site-packages\pandas\core\generic.py", line 731, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
感谢
我试图让最早开始时间(然后我会尝试获得最新的结束时间),每个重叠系列..不知道我跟你提出什么,我的道歉 –
完整的代码重现内联更新 - 谢谢 –