Python多行正则表达式替换

我不好意思询问还有另一个正则表达式的问题，但是这一直让我在过去的一周里很疯狂。Python多行正则表达式替换

我想在Python中使用正则表达式来替换一些文字，看起来像这样：

text = """some stuff 
line with text 
other stuff 
[code language='cpp'] 
#include <cstdio> 

int main() { 
    printf("Hello"); 
} 
[/code] 
Maybe some 
other text"""

我想要做的就是捕捉[code]标签内的文本中，添加一个标签（\t）每条线的前面，然后用预先标记的这些新行替换所有[code]...[/code]。也就是说，我希望结果如下所示：

"""some stuff 
line with text 
other stuff 

    #include <cstdio> 

    int main() { 
     printf("Hello"); 
    } 

Maybe some 
other text"""

我正在使用以下代码段。

class CodeParser(object): 
    """Parse a blog post and turn it into markdown.""" 

    def __init__(self): 
     self.regex = re.compile('.*\[code.*?\](?P<code>.*)\[/code\].*', 
           re.DOTALL) 

    def parse_code(self, text): 
     """Parses code section from a wp post into markdown.""" 
     code = self.regex.match(text).group('code') 
     code = ['\t%s' % s for s in code.split('\n')] 
     code = '\n'.join(code) 
     return self.regex.sub('\n%s\n' % code, text)

的问题，这是它的所有字符之前因为最初和最后的.*，当我进行更换，这些被删除的code标签后匹配。如果我删除.*，那么再也不会匹配任何东西。

我想这可能是用换行问题，所以我试图用，比如说，'¬'更换所有的'\n'，进行匹配，然后改变'¬'回'\n'，但我没有任何与此运气做法。

如果有人有更好的方法来完成我想完成的任务，我乐意提供建议。

谢谢。

来源

2015-07-11 Andrés

你在正确的轨道上。而不是regex.match，使用regex.search。这样你可以摆脱领先和尾随.*s。

Try this: 
    def __init__(self): 
     self.regex = re.compile('\[code.*?\](?P<code>.*)\[/code\]', 
           re.DOTALL) 


    def parse_code(self, text): 
     """Parses code section from a wp post into markdown.""" 
     # Here we are using search which finds the pattern anywhere in the 
     # string rather than just at the beginning 
     code = self.regex.search(text).group('code') 
     code = ['\t%s' % s for s in code.split('\n')] 
     code = '\n'.join(code) 

     return self.regex.sub('\n%s\n' % code, text)

来源

2015-07-11 20:36:50 gymbrall

谢谢！我应该不断地阅读文档，进一步下来...这是[在那里]（https://docs.python.org/3/howto/regex.html#match-versus-search） –

Python多行正则表达式替换

回答

相关问题