python中的我的正则表达式没有正确地递归

我想捕获标记内的所有内容以及它后面的下一行，但是假设它在下一次遇到括号时停止。我究竟做错了什么？python中的我的正则表达式没有正确地递归

import re #regex 

regex = re.compile(r""" 
     ^     # Must start in a newline first 
     \[\b(.*)\b\]   # Get what's enclosed in brackets 
     \n     # only capture bracket if a newline is next 
     (\b(?:.|\s)*(?!\[)) # should read: anyword that doesn't precede a bracket 
     """, re.MULTILINE | re.VERBOSE) 

haystack = """ 
[tab1] 
this is captured 
but this is suppose to be captured too! 
@[this should be taken though as this is in the content] 

[tab2] 
help me 
write a better RE 
""" 
m = regex.findall(haystack) 
print m

什么IM试图得到的是：
[（ 'TAB1'，“这是捕获\ n但这个是假设过于捕获\ n @这应当理解，虽然，因为这！在内容] \ n”， '[TAB2]'， '帮我\ Nwrite这更好的RE \ n'）]

编辑：

regex = re.compile(r""" 
      ^   # Must start in a newline first 
      \[(.*?)\] # Get what's enclosed in brackets 
      \n   # only capture bracket if a newline is next 
      ([^\[]*) # stop reading at opening bracket 
     """, re.MULTILINE | re.VERBOSE)

这似乎工作，但它也修整括号内内容。

来源

2009-06-05 cybervaldez

Python的正则表达式不支持递归afaik。

编辑：但在你的情况下，这会工作：

regex = re.compile(r""" 
     ^   # Must start in a newline first 
     \[(.*?)\] # Get what's enclosed in brackets 
     \n   # only capture bracket if a newline is next 
     ([^\[]*) # stop reading at opening bracket 
    """, re.MULTILINE | re.VERBOSE)

编辑2：是的，它不能正常工作。

import re 

regex = re.compile(r""" 
    (?:^|\n)\[    # tag's opening bracket 
     ([^\]\n]*)   # 1. text between brackets 
    \]\n     # tag's closing bracket 
    (.*?)     # 2. text between the tags 
    (?=\n\[[^\]\n]*\]\n|$) # until tag or end of string but don't consume it 
    """, re.DOTALL | re.VERBOSE) 

haystack = """[tag1] 
this is captured [not a tag[ 
but this is suppose to be captured too! 
[another non-tag 

[tag2] 
help me 
write a better RE[[[] 
""" 

print regex.findall(haystack)

虽然我同意viraptor。正则表达式很酷，但你不能检查你的文件与他们的错误。混合也许？：P

tag_re = re.compile(r'^\[([^\]\n]*)\]$', re.MULTILINE) 
tags = list(tag_re.finditer(haystack)) 

result = {} 
for (mo1, mo2) in zip(tags[:-1], tags[1:]): 
    result[mo1.group(1)] = haystack[mo1.end(1)+1:mo2.start(1)-1].strip() 
result[mo2.group(1)] = haystack[mo2.end(1)+1:].strip() 

print result

编辑3：这是因为^字符意味着只有内部[^squarebrackets]负匹配。在其他地方，它意味着字符串开始（或开始于re.MULTILINE）。在正则表达式中只有字符没有好的方式来进行负面字符串匹配。

来源

2009-06-05 09:24:39

感谢您的答复，我看，我确实尝试了递归（R？），但你说的没错它不是真正的工作在Python中，所以你知道一种方式让我做到这一点，我可以实现我想做的事情？ – cybervaldez 2009-06-05 09:29:40

我有一个问题，它似乎停止时，也有一个括号内的支架。我该如何做到这一点，只有当它仅在行的开始处找到[括号]时才会停止。 [tab1] – cybervaldez 2009-06-06 11:40:19

谢谢，我的这个问题已经很丰富，因为很多细节和选择已经出现。对于事情与你的第一个解决方案有什么不同，我感到非常惊讶。我不知道为什么我的解决方案无法正常工作：（^ [\ n \ [] *），如果在换行符之后有一个[括号]为什么它不起作用？这仅仅是为了思考，你的答案已经很完美了。 – cybervaldez 2009-06-07 00:41:35

这是做你想做的吗？

regex = re.compile(r""" 
     ^     # Must start in a newline first 
     \[\b(.*)\b\]   # Get what's enclosed in brackets 
     \n      # only capture bracket if a newline is next 
     ([^[]*) 
     """, re.MULTILINE | re.VERBOSE)

这给出了元组列表（每个匹配一个2元组）。如果你想要一个扁平的元组，你可以这样写：

m = sum(regex.findall(haystack),())

来源

2009-06-05 09:32:38

首先为什么一个正则表达式，如果你试图解析？正如你所看到的，你无法自己找到问题的根源，因为正则表达式没有给出任何反馈。您也没有在该RE中进行任何递归。

让你的生活简单：

def ini_parse(src): 
    in_block = None 
    contents = {} 
    for line in src.split("\n"): 
     if line.startswith('[') and line.endswith(']'): 
     in_block = line[1:len(line)-1] 
     contents[in_block] = "" 
     elif in_block is not None: 
     contents[in_block] += line + "\n" 
     elif line.strip() != "": 
     raise Exception("content out of block") 
    return contents

你得到错误例外与作为奖金的能力来调试执行处理。你也可以得到一个字典作为结果，并可以处理时处理重复的部分。我的结果：

{'tab2': 'help me\nwrite a better RE\n\n', 
'tab1': 'this is captured\nbut this is suppose to be captured too!\[email protected][this should be taken though as this is in the content]\n\n'}

RE很多过度使用这些天...

来源

2009-06-06 12:15:02 viraptor

python中的我的正则表达式没有正确地递归

回答

相关问题