Python正则表达式findall

我想在Python 2.7.2中使用正则表达式从字符串中提取所有出现的标记词。或者干脆，我想提取[p][/p]标签中的每一段文字。这里是我的尝试：Python正则表达式findall

regex = ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?" 
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday." 
person = re.findall(pattern, line)

印刷person产生['President [P]', '[/P]', '[P] Bill Gates [/P]']

什么是正确的正则表达式来获得：['[P] Barack Obama [/P]', '[P] Bill Gates [/p]'] 或['Barrack Obama', 'Bill Gates']。

谢谢。 :)

来源

2011-10-13 Ignatius

import re 
regex = ur"\[P\] (.+?) \[/P\]+?" 
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday." 
person = re.findall(regex, line) 
print(person)

产生

['Barack Obama', 'Bill Gates']

正则表达式ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?"完全相同 Unicode作为除更难阅读u'[[1P].+?[/P]]+?'。

第一个括号组[[1P]告诉re任何列表['[', '1', 'P']的字符应匹配，并且同样与第二组括号。那[/P]]不是你想要的所有东西。所以，

删除外围方括号。（另外在P前面取下杂散1。）
为了保护字面括号中[P]，逃生用反斜杠括号：\[P\]。
要仅返回标签内的单词，请在.+?附近放置分组圆括号。

来源

2011-10-13 10:20:25 unutbu

试试这个：

for match in re.finditer(r"\[P[^\]]*\](.*?)\[/P\]", subject): 
     # match start: match.start() 
     # match end (exclusive): match.end() 
     # matched text: match.group()

来源

2011-10-13 10:21:12 FailedDev

我真的很喜欢这个答案。如果你只想处理匹配，那么这样做不需要像1）保存列表，2）处理列表不等于str = blah洗碗机' ##这里re.findall（）返回所有找到的电子邮件字符串列表 emails = re.findall（r'[\ w \ .-] + @ [\ w \ .-] +'， str）## ['[email protected]'，'bob @ abc。com'] 用于电子邮件中的电子邮件：＃对每个找到的电子邮件字符串做一些操作打印电子邮件 – kkron

你的问题是不是100％清楚，但我假设你想找到的每一段文字里面[P][/P]标签：

>>> import re 
>>> line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday." 
>>> re.findall('\[P\]\s?(.+?)\s?\[\/P\]', line) 
['Barack Obama', 'Bill Gates']

来源

2011-10-13 10:24:22 Blair

可以用

替换您的图案

regex = ur"\[P\]([\w\s]+)\[\/P\]"

来源

2011-10-13 10:31:59 pram

注意您的格式; *使用预览区域*。因为你没有正确格式化，所以反斜杠是乱码（Markdown就像那样差）。 –

你为什么要用'[\ w \ s] +'而不是'。*？'这是他用的？对我来说'无论如何''*？'更可能是他想要的东西。 '[\ w \ s]'是可怕的限制。 –

故意的限制。我使用[\ w \ s] +，因为提交者显然希望提取很少包含数字的名称。还要注意提问者想提取单词，而不是数字。只是我的意见，尽管如此，cmiiw – pram

使用此模式，

pattern = '\[P\].+?\[\/P\]'

检查here

来源

2016-07-18 06:16:44 Sohn

Python正则表达式findall

回答

相关问题