正则表达式结合列表中的数字写成字

我想提取有关从几篇文章中受伤的人的信息。问题在于以新闻语言传达这些信息的方式不同，因为它可以用数字或文字书写。正则表达式结合列表中的数字写成字

例如：

`Security forces had *wounded two* gunmen inside the museum but that two or three accomplices might still be at large.` 

`The suicide bomber has wounded *four men* last night.` 

`*Dozens* were wounded in a terrorist attack.`

我注意到，因为大部分时间数字，1-10去的都写在单词而不是数字。我想知道如何提取它们而不会产生任何令人费解的代码，只需从1-10的单词列出正则表达式即可。

我应该使用一个列表吗？它将如何包括在内？

这是我迄今为止用于提取人与数字受伤人数的模式：

text_open = open("News") 
text_read = text_open.read() 
pattern= ("wounded (\d+)|(\d+) were wounded|(\d+) injured|(\d+) people were wounded|wounding (\d+)|wounding at least (\d+)") 
result = re.findall(pattern,text_read) 
print(result)

来源

2016-12-02 M.Huntz

试试这个

import re 

regex = r"(\w)+\s(?=were)|(?<=wounded|injured)\s[\w]{3,}" 

test_str = ("`Security forces had wounded two gunmen inside the museum but that two or three accomplices might still be at large.`\n\n" 
    "`The suicide bomber has wounded four men last night.`\n\n" 
    "`Dozens were wounded in a terrorist attack.") 

matches = re.finditer(regex, test_str) 

for match in matches:  
    print (match.group().strip())

输出：

two 
four 
Dozens

\w+\s(?=were)：?=展望未来were，找到捕获字使用\w

|或

(?<=wounded|injured)\s\w{3,}：?<=如果受伤或受伤的字前发生和{3,}平均字的长度为3个或更多，只是为了避免拍摄字即in，每个数字字有分钟向后看，捕捉字长度为3，所以可以使用它。

来源

2016-12-02 18:28:06

正则表达式结合列表中的数字写成字

回答

相关问题