2016-11-17 81 views
1

我想在大多数文件的开头注释掉代码中匹配开源许可证类型。但是,对于期望的字符串(例如较低通用公共许可证)跨越两行的情况,我遇到了困难。例如,查看许可证下面的代码。Python Regex中评论代码

* Copyright (c) Codice Foundation 
* <p/> 
* This is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser 
* General Public License as published by the Free Software Foundation, either version 3 of the 
* License, or any later version. 
* <p/> 
* This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without 
* even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU 
* Lesser General Public License for more details. A copy of the GNU Lesser General Public License 
* is distributed along with this program and can be found at 
* <http://www.gnu.org/licenses/lgpl.html>. 
*/ 

使用正则表达式的回溯是不可能的,因为在注释代码空间未知数量以及在不同的语言不同的注释字符。我目前正则表达式的例子包括如下:

self._cr_license_re['GNU']       = re.compile('\sGNU\D') 
self._cr_license_re['MIT License']     = re.compile('MIT License|Licensed MIT|\sMIT\D') 
self._cr_license_re['OpenSceneGraph Public License'] = re.compile('OpenSceneGraph Public License', re.IGNORECASE) 
self._cr_license_re['Artistic License']    = re.compile('Artistic License', re.IGNORECASE) 
self._cr_license_re['LGPL']       = re.compile('\sLGPL\s|Lesser General Public License', re.IGNORECASE) 
self._cr_license_re['BSD']       = re.compile('\sBSD\D') 
self._cr_license_re['Unspecified OS']     = re.compile('free of charge', re.IGNORECASE) 
self._cr_license_re['GPL']       = re.compile('\sGPL\D|(?<!Lesser)\sGeneral Public License', re.IGNORECASE) 
self._cr_license_re['Apache License']     = re.compile('Apache License', re.IGNORECASE) 
self._cr_license_re['Creative Commons']    = re.compile('\sCC\D') 

我欢迎就如何解决Python中使用正则表达式这个问题的任何建议。

+0

“如果只有一种方法可以将线条粘在一起成为单个长字符串”? – usr2564301

+0

问题是什么?用'\ s +'替换你的OpenSceneGraph公共许可证(和任何地方)中的所有文字空间,就是这样。 –

回答

1

你可以使用this regex,并用空格

\s*\*\s*\/? 

这种替换应该把多行注释在同一行,那么你就可以找到它的许可证。

+0

好的建议。但是,上面的正则表达式并没有删除换行符('\ n')字符。最终有效的是: 'text = fid.read()。replace('\ n','') fin_text = re.sub('s * \ * \ s * \ /?','',text) ' – lmum27