3
我想提取有关从几篇文章中受伤的人的信息。问题在于以新闻语言传达这些信息的方式不同,因为它可以用数字或文字书写。正则表达式结合列表中的数字写成字
例如:
`Security forces had *wounded two* gunmen inside the museum but that two or three accomplices might still be at large.`
`The suicide bomber has wounded *four men* last night.`
`*Dozens* were wounded in a terrorist attack.`
我注意到,因为大部分时间数字,1-10去的都写在单词而不是数字。我想知道如何提取它们而不会产生任何令人费解的代码,只需从1-10的单词列出正则表达式即可。
我应该使用一个列表吗?它将如何包括在内?
这是我迄今为止用于提取人与数字受伤人数的模式:
text_open = open("News")
text_read = text_open.read()
pattern= ("wounded (\d+)|(\d+) were wounded|(\d+) injured|(\d+) people were wounded|wounding (\d+)|wounding at least (\d+)")
result = re.findall(pattern,text_read)
print(result)