确定数据框中重叠时间段的数量

如何计算合约使用期限内重叠合约的数量？

df = pd.DataFrame({ 
    'contract': pd.Series(['A1', 'A2', 'A3', 'A4']), 
    'start': pd.Series(['01/01/2015', '03/02/2015', '15/01/2015', '10/01/2015']), 
    'end': pd.Series(['16/01/2015', '10/02/2015', '18/01/2015', '12/01/2015']) 
})

其给出：

contract   end  start 
0  A1 16/01/2015 01/01/2015 
1  A2 10/02/2015 03/02/2015 
2  A3 18/01/2015 15/01/2015 
3  A4 12/01/2015 10/01/2015

A1与A3和A4重叠，因此重叠= 2 A2没有合同重叠，因此重叠= 0 A3与A1重叠，因此重叠= 1. A4重叠与A1，因此重叠= 1

我可以比较每个时间跨度（开始到结束），但是这是O(n**2) 任何更好的主意？

我有感觉的改善可以通过排序得到，然后looping through the sorted ranges

来源

2015-05-04 NoIdeaHowToFixThis

澄清：你有一个长长的合约清单。你希望你的输出是一个n长的整数列表，表示对于每个输入合同，它们有多少其他合同重叠，对吗？ –

@AndrewJanke：对。谢谢。 – NoIdeaHowToFixThis

这是一个熊猫数据框？ –

这里是一个办法做到这一点：其中产量

df = pd.DataFrame({ 
    'contract': pd.Series(['A1', 'A2', 'A3', 'A4']), 
    'start': pd.Series(['01/01/2015', '03/02/2015', '15/01/2015', '10/01/2015']), 
    'end': pd.Series(['16/01/2015', '10/02/2015', '18/01/2015', '12/01/2015']) 
}) 
df['start'] = pd.to_datetime(df.start, dayfirst=True) 
df['end'] = pd.to_datetime(df.end, dayfirst=True) 

periods = df[['start', 'end']].apply(lambda x: (pd.date_range(x['start'], x['end']),), axis=1) 
overlap = periods.apply(lambda col: periods.apply(lambda col_: col[0].isin(col_[0]).any())) 
df['overlap_count'] = overlap[overlap].apply(lambda x: x.count() - 1, axis=1) 
print df

：

contract  end  start overlap_count 
0  A1 2015-01-16 2015-01-01    2 
1  A2 2015-02-10 2015-02-03    0 
2  A3 2015-01-18 2015-01-15    1 
3  A4 2015-01-12 2015-01-10    1

我已经更新输出重叠计数的代码，而不是以天为单位的重叠。

来源

2015-05-04 20:01:50 Primer

确定数据框中重叠时间段的数量

回答

相关问题