2013-04-23 95 views
0

好吧,所以我有一堆C和C++代码,我需要通过筛选并找到函数defenitions。我不知道函数类型/返回值,我不知道函数defenition或函数调用中的参数数量等。Python - Regexp - 查找函数名称,但不是函数调用

到目前为止,我有:

import re, sys 
from os.path import abspath 
from os import walk 

function = 'msg' 
regexp = r"(" + function + ".*[^;]){" 

found = False 
for root, folders, files in walk('C:\\codepath\\'): 
    for filename in files: 
     with open(abspath(root + '/' + filename)) as fh: 
      data = fh.read() 
      result = re.findall(regexp, data) 
      if len(result) > 0: 
       sys.stdout.write('\n Found function "' + config.function + '" in ' + filename + ':\n\t' + str(result)) 
       sys.stdout.flush() 
    break 

然而,这会产生一些不想要的结果。 的正则表达式必须是故障taulrant例如这些组合:

查找中说的所有突变 “味精” defenition而不是 “MSG()” 呼叫:

void 
shapex_msg (struct shaper *s) 
{ 
    msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
     s->bytes_per_second); 
} 

void shapex_msg (struct shaper *s) 
{ 
    msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
     s->bytes_per_second); 
} 

void shapex_msg (struct shaper *s) { 
    msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
     s->bytes_per_second); 
} 

回答

1

也许类似下面的正则表达式:

def make_regex(name): 
    return re.compile(r'\s*%s\s*\([^;)]*\)\s*\{' % re.escape(name)) 

测试你的例子:

>>> text = ''' 
... void 
... shapex_msg (struct shaper *s) 
... { 
... msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
...  s->bytes_per_second); 
... } 
... 
... void shapex_msg (struct shaper *s) 
... { 
... msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
...  s->bytes_per_second); 
... } 
... 
... void shapex_msg (struct shaper *s) { 
... msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
...  s->bytes_per_second); 
... }''' 
>>> shapex_msg = make_regex_for_function('shapex_msg') 
>>> shapex_msg.findall(text) 
['\nshapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s) {'] 

它仅适用于多定义:

>>> shapex_msg.findall('''int 
     shapex_msg  (
int a, 
int b 
) 

     {''' 
['\n \tshapex_msg \t(\nint a,\nint b\n) \n\n\t{'] 

虽然,与函数调用:

>>> shapex_msg.findall('shapex_msg(1,2,3);') 
[] 

正如一个音符,你的正则表达式不起作用,因为.*是贪婪的,因此它不匹配正确的右括号。

+0

您上次的编辑给了我一份工作副本!谢谢! Ineed贪婪的参数搞乱了事情..一直在尝试这么多的组合,我无法环绕我的头..所以谢谢你! – Torxed 2013-04-23 14:55:05

+0

@Toxxed是的,对不起。写下来的时候我忘了放一个'*':s – Bakuriu 2013-04-23 14:56:03