用正则表达式在C++中搜索基本注释

我正在写一个Python程序，用于使用正则表达式在C++程序中搜索注释。我写了下面的代码：用正则表达式在C++中搜索基本注释

import re 
regex = re.compile(r'(\/\/(.*?))\n|(\/\*(.|\n)*\*\/)') 
comments = [] 
text = "" 
while True: 
    try: 
     x= raw_input() 
     text = text + "\n"+ x 
    except EOFError: 
     break 
z = regex.finditer(text) 
for match in z: 
    print match.group(1)

此代码应检测//I'm comment类型和/*blah blah blah blah blah*/ 我得到以下输出的评论：

// my program in C++ 
None 
//use cout

这是我不期待。我的想法是match.group（1）应该捕获第一个括号(\/\*(.|\n)*\*\/)，但事实并非如此。 C++的程序，我测试的是：

// my program in C++ 

#include <iostream> 
/** I love c++ 
    This is awesome **/ 
using namespace std; 

int main() 
{ 
    cout << "Hello World"; //use cout 
    return 0; 
}

来源

2014-11-14 Dheerendra

您没有使用好为了做到这一点多行注释里面，因为内部注释可以包括。所以你需要用多行注释开始你的模式。例如：

/\*[\s\S]*?\*/|//.*

注意，如果你有长多行注释可以改善这种情况（这句法是原子团特性的仿真未被re模块支持）：

/\*(?:(?=([^*]+|\*(?!/))\1)*\*/|//.*

但请注意，还有其他陷阱，如包含/*...*/或//.....的字符串。

所以，如果你想避免这种情况下，例如，如果你想更换，你需要字符串之前捕获并在替换字符串中使用反向引用，就像这样：

(pattern for strings)|/\*[\s\S]*?\*/|//.*

更换：$1

来源

2014-11-14 21:50:28

在我的情况下，这些陷阱是不可能的:) – Dheerendra 2014-11-14 21:56:39

@Dheerendra：所以你只需要简单的答案。 – 2014-11-14 21:57:42

使用组（0）在 'TXT' 文件中的内容就是你们的榜样：

import re 
regex = re.compile(r'(\/\/(.*?))\n|(\/\*(.|\n)*\*\/)') 
comments = [] 
text = "" 
for line in open('txt').readlines(): 
    text = text + line 
z = regex.finditer(text) 
for match in z: 
    print match.group(0).replace("\n","")

我Ø本安输出为：

// my program in C++ 
/** I love c++  This is awesome **/ 
//use cout

为了帮助你们理解：

import re 
regex = re.compile(r'((\/\/(.*?))\n|(\/\*(.|\n)*\*\/))') 
comments = [] 
text = "" 
for line in open('txt').readlines(): 
    text = text + line 
z = regex.finditer(text) 
for match in z: 
    print match.group(1)

将输出：

// my program in C++ 

/** I love c++ 
    This is awesome **/ 
//use cout

来源

2014-11-14 22:05:41

我不想让换行符 – Dheerendra 2014-11-14 22:08:17

为您更好地理解，group（1）表示第一个加括号的子组。在你的情况是“//”，它不能找到你的“/ * ... * /”情况 – 2014-11-14 22:08:25

你可以在打印前修剪“\ n” – 2014-11-14 22:09:06

不幸的是，你必须在同一时间解析报价和非注释，因为
部分评论语法可嵌入其中。

这是一个古老的Perl正则表达式，可以做到这一点。一场比赛的兴趣是捕获组1
包含评论。所以做while循环使用全局搜索。检查组1匹配。

# (/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\]*) 


    (        # (1 start), Comments 
     /\*        # Start /* .. */ comment 
     [^*]* \*+ 
     (?: [^/*] [^*]* \*+)* 
     /        # End /* .. */ comment 
     | 
     //        # Start // comment 
     (?: [^\\] | \\ \n?)*?   # Possible line-continuation 
     \n        # End // comment 
    )        # (1 end) 
| 
    (        # (2 start), Non - comments 
     " 
     (?: \\ [\S\s] | [^"\\])*  # Double quoted text 
     " 
     | ' 
     (?: \\ [\S\s] | [^'\\])*  # Single quoted text 
     ' 
     | [\S\s]       # Any other char 
     [^/"'\\]*      # Chars which doesn't start a comment, string, escape, 
              # or line continuation (escape + newline) 
    )        # (2 end)

来源

2014-11-14 22:06:55 sln

添加另一个答案。

（注 - 您有没有涉及到的交替顺序注释子表达式
问题）

你们是简化正则表达式版本，以获得C++评论
，如果你不想要完整版本，我们可以看看
为什么你有问题。

首先你的正则表达式是差不多是正确的。有一个问题
与/* ... */评论的子表达式。内容必须为
非贪心。

除此之外，它的工作原理应该如此。
但是你应该仔细观察捕获组。
在您的代码中，您只在每场比赛中打印组1，这是// ...
评论。您可以检查组1和3中的匹配，或者
只打印出组0（整个比赛）。

此外，您不需要懒惰量词?第2组，并
换行符\n下面应该不在那里。
而且，考虑让所有捕获组不捕获(?: ..)。

因此，请在// ...子表达式中删除?量词和\n。
并在/* ... */子表达式中添加?量词。

这里是你原来的正则表达式格式化 - （使用RegexFormat 5有自动留言）

# raw regex: (//(.*?))\n|(/\*(.|\n)*\*/) 

    (     # (1 start) 
     // 
     (.*?)    # (2) 
    )     # (1 end) 
    \n 
| 
    (     # (3 start) 
     /\* 
     (. | \n)*   # (4) 
     \*/ 
    )     # (3 end)

这是没有捕获组和2度次要量词的变化。

# raw regex: //(?:.*)|/\*(?:.|\n)*?\*/ 

    // 
    (?: .*) 
| 
    /\* 
    (?: . | \n)*? 
    \*/

输出

** Grp 0 - (pos 0 , len 21) 
// my program in C++ 

--------------------------- 

** Grp 0 - (pos 43 , len 38) 
/** I love c++ 
    This is awesome **/ 

--------------------------- 

** Grp 0 - (pos 143 , len 10) 
//use cout

来源

2014-11-15 18:12:00 sln

用正则表达式在C++中搜索基本注释

回答

相关问题