Python的正则表达式捕获两种意见

例：Python的正则表达式捕获两种意见

a = "bzzzzzz <!-- blabla --> blibli * bloblo * blublu"

我想赶上第一条评论。注释可能

(<!-- .* -->) or (\* .* \*)

这是确定的：

re.search("<!--(?P<comment> .*)-->",a).group(1)

也是：

re.search("\*(?P<comment> .*)\*",a).group(1)

但是如果我想一个或另一个在评论，我已经试过类似：

re.search("(<!--(?P<comment> .*)-->|\*(?P<comment> .*)\*)",a).group(1)

但它不起作用

谢谢

来源

2011-09-23 pablo07

顺便说一句，你regexs是贪婪和失败，会在像'<！ - 第一个注释 - >真材实料<！ - 第二个评论 - >'。 –

尝试条件表达式：

>>> for m in re.finditer(r"(?:(<!--)|(\*))(?P<comment> .*?)(?(1)-->)(?(2)\*)", a): 
... print m.group('comment') 
... 
blabla 
bloblo

来源

2011-09-23 15:35:30 eph

正如Gurney指出的，你有两个同名的捕获。既然你实际上并没有使用这个名字，那就把它留下。

此外，r""原始字符串表示法是一个好习惯。

哦，还有第三件事：你抓错了索引。 0是整场比赛，1是整个“或 - 或”块，并且2将成为成功的内在俘获。

re.search(r"(<!--(.*)-->|\*(.*)\*)",a).group(2)

来源

2011-09-23 15:22:59 Chriszuma

索引3是什么？ – sln

这个正则表达式永远不会有索引3。 – Chriszuma

您在“不工作”得到的异常部分是相当明确的关于什么是错误的：

sre_constants.error: redefinition of group name 'comment' as group 3; was group 2

两个组具有相同的名称：只是重命名第二个

>>> re.search("(<!--(?P<comment> .*)-->|\*(?P<comment2> .*)\*)",a).group(1) 
'<!-- blabla -->' 
>>> re.search("(<!--(?P<comment> .*)-->|\*(?P<comment2> .*)\*)",a).groups() 
('<!-- blabla -->', ' blabla ', None) 
>>> re.findall("(<!--(?P<comment> .*)-->|\*(?P<comment2> .*)\*)",a) 
[('<!-- blabla -->', ' blabla ', ''), ('* bloblo *', '', ' bloblo ')]

来源

2011-09-23 15:23:18

re.findall可能是这更好的契合：

import re 

# Keep your regex simple. You'll thank yourself a year from now. Note that 
# this doesn't include the surround spaces. It also uses non-greedy matching 
# so that you can embed multiple comments on the same line, and it doesn't 
# break on strings like '<!-- first comment --> fragment -->'. 
pattern = re.compile(r"(?:<!-- (.*?) -->|\* (.*?) \*)") 

inputstring = 'bzzzzzz <!-- blabla --> blibli * bloblo * blublu foo ' \ 
       '<!-- another comment --> goes here' 

# Now use re.findall to search the string. Each match will return a tuple 
# with two elements: one for each of the groups in the regex above. Pick the 
# non-blank one. This works even when both groups are empty; you just get an 
# empty string. 
results = [first or second for first, second in pattern.findall(inputstring)]

来源

2011-09-23 16:06:34

你可以去的2种方式（如果Python的支持）1 -

1：分公司复位（|图案|图案| ...）
(?||\*(.*?)\*)/捕获组1总是包含注释文本

2：条件表达式（（条件）是模式|无模式？）
(?:(|\*)这里的条件，我们什么上尉GRP1

个

修饰符sg单行和全球

来源

2011-09-23 16:19:38 sln

Python的正则表达式捕获两种意见

回答

相关问题