Python - 如何提取包含引用标记的句子？

text = "Trondheim is a small city with a university and 140000 inhabitants. Its central bus systems has 42 bus lines, serving 590 stations, with 1900 (departures per) day in average. T h a t gives approximately 60000 scheduled bus station passings per day, which is somehow represented in the route data base. The starting point is to automate the function (Garry Weber, 2005) of a route information agent." 
print re.findall(r"([^.]*?\(.+ [0-9]+\)[^.]*\.)",text)

我使用上面的代码来提取引用它的句子。正如你可以看到最后一句包含引文（Garry Weber，2005）。Python - 如何提取包含引用标记的句子？

但是我得到了这个结果： ['它的中央公交系统有42条公交线路，服务590个车站，平均每天有1900个班次。每天约有60000个预定的公交车站通行证，这在路线数据库中以某种方式表现出来。 “]

结果应该是仅包含引用的句子，如下所示：开始点是自动执行函数（（））（起始点是自动执行函数的函数（Garry Weber，2005）加里韦伯，2005年）的路线信息代理。

我想问题是由圆括号内的文本引起的，正如你可以在它包含的第二行（离开每个）中看到的，我的代码的任何解决方案？

来源

2017-08-13 Ivan

而不是'\（。+'你可能想在这里使用'\（[^）] +' –

哦，我的谢谢你，你解决了我的问题 – Ivan

我的尝试。 Live demo。

\b[^.]+\([^()]+\b(\d{2}|\d{4})\s*\)[^.]*\.

它精确地捕捉到了句子，并且比你的年份更具体。

来源

2017-08-13 14:58:19 linden2015

Python - 如何提取包含引用标记的句子？

回答

相关问题