多个正则表达式的工作不一行匹配

开始要么#或|（管道符号），其次是一些文本和
括号
“股票”，随后的所有文字，直到下一场比赛

示例代码：

text = """ 
#Test name 1 (ABCD) blah blah# some more text 1||Test name 2 (EFGH) blah blah some more text 2 
#Test name 3 (IJKL) blah blah# some more text 3 
|Test name 4 (MNOP) blah blah||some more text 4 
|Test name 5 (QRST) blah blah||some more text 5| 
""" 
expr = r'(?P<alltext>(#|\|)[^<>]+\((?P<ticker>[A-Z]{1,10})\)(?P<bodytext>.*))' 
compiled_expr = re.compile(expr, re.MULTILINE) 
matches = re.finditer(expr,text) 
for match in matches: 
    d=match.groupdict() 
    print d['alltext']

样本输出

#Test name 1 (ABCD) blah blah# some more text 1||Test name 2 (EFGH) blah blah some more text 2 
#Test name 3 (IJKL) blah blah# some more text 3 
|Test name 4 (MNOP) blah blah||some more text 4 
|Test name 5 (QRST) blah blah||some more text 5|

这不拿起第一行的两场比赛。我需要的是为它来检测“测试名2 ......”

所以我想输出是：

#Test name 1 (ABCD) blah blah# some more text 1| 
|Test name 2 (EFGH) blah blah some more text 2 
#Test name 3 (IJKL) blah blah# some more text 3 
|Test name 4 (MNOP) blah blah||some more text 4 
|Test name 5 (QRST) blah blah||some more text 5|

来源

2014-10-01 zio

根据您的标准，为什么'＃一些文字1'不是一个单独的比赛？ – thefourtheye 2014-10-01 18:18:49

@thefourtheye：这是因为'＃some more text 1'中没有大括号'（）'。 – 2014-10-01 18:20:42

您不需要多行修饰符。这个'[^ <>] +'贪婪地匹配你的字符串（全部）中的每个字符，因为你没有这些字符。结果，从第一个＃到最后一组括号匹配，然后是括号，然后是其余。 – sln 2014-10-01 18:47:27

[#|][^#|]*?$.*?$.*?(?=(?:[#|][^#|]*?$.*?$)|$)，与单行改性剂（又名“点匹配所有”）。

Demo.

说明：

[#|] # match "#" or "|" 
[^#|]*? # any text except "#" or "|", up until the next... 
\(#..."(" 
.*? # any text enclosed in the braces 
\) # and a closing brace 
.*? # finally, any text until the next match OR the end of the string. 
(?= 
    (?: # this is the same pattern as before. 
     [#|] 
     [^#|]*? 
     \(
     .*? 
     \) 
    ) 
| 
    $ 
)

来源

2014-10-01 18:15:23

'单行修饰符'你是指点匹配吗？这个'。*？'在你的lookahead结尾并不是真的需要。 – sln 2014-10-01 18:58:18

是的，点匹配都是我的意思。你也对'。*？'说得对。我会删除它。 – 2014-10-01 20:27:06

只需要提一下，默认情况下regex对于锚点'^ $'是指'单线'，意思是BOS/EOS，而不是'多线'，在那里它们还表示BOL/EOL。人们的困惑是遗留文档错误地将's'修饰符指定为'单行'，它根本不影响锚'^ $'。正则表达式中可能有50％的时间，'sm'被同时使用，但结果不是'单一多行模式'。它更好的形式来说'Dot-All'或者dot与newline匹配，因为它很精确并且避免了混淆。 – sln 2014-10-02 18:18:03

多个正则表达式的工作不一行匹配

回答

相关问题