处理“++”在Python正则表达式标志

我有话
我创建基于词处理“++”在Python正则表达式标志

import re 
word = 'This is word of spy++' 
wl = ['spy++','cry','fpp'] 
regobjs = [re.compile(r"\b%s\b" % word.lower()) for word in wl] 

for reobj in regobjs: 
    print re.search(regobj, word).group()

但是在建立正则表达式我得到错误(error: multiple repeat)这个名单上的正则表达式对象的列表清单objs，因为符号++ 我如何使正则表达式来处理单词列表中的所有单词的情况？

requirements: 

     regex should detect the exact word from the given text 
even if the word having non alpha numeric chars like (++) above code detect the exact words except those having ++ char.

来源

2011-11-28 Shashi

你需要['re.escape（）']（HTTP：//docs.python .ORG /库/ re.html＃re.escape）。 –

@SvenMarnach：他需要比这更多的... –

@Sashi没有人想要获得错误。想要“不要获得”并不会带来关于“获得”的信息。写作_“处理所有情况”_超级模糊 – eyquem

此外re.escape()还需要先取消该\b字边界/非字母数字字符之后，或匹配将失败。

像这样的东西（不是很优雅，但我希望它横跨得到点）：

import re 
words = 'This is word of spy++' 
wl = ['spy++','cry','fpp'] 
regobjs = [] 

for word in wl: 
    eword = re.escape(word.lower()) 
    if eword[0].isalnum() or eword[0]=="_": 
     eword = r"\b" + eword 
    if eword[-1].isalnum() or eword[-1]=="_": 
     eword = eword + r"\b" 
    regobjs.append(re.compile(eword)) 

for regobj in regobjs: 
    print re.search(regobj, words).group()

来源

2011-11-28 12:29:36

是的，它工作后删除\ b。谢谢。 – Shashi

但它工作，如果我想匹配给定字符串中的确切单词这就是为什么我已经添加\ b。 – Shashi

它是不是在做\ b的存在的完全匹配需要两个选项\ b以及re.escape（）或者他们的任何替代解决方案？ – Shashi

你想用\b当你的单词开头或字母，数字或下划线结束，\B当它不。这意味着你不会拿起spy++x，但会拿起spy++.甚至spy+++。如果你想避免最后这些，那么事情会变得更加复杂。

>>> def match_word(word): 
    return re.compile("%s%s%s" % (
     "\\b" if word[0].isalnum() or word[0]=='_' else "\\B", 
     re.escape(word.lower()), 
     "\\b" if word[-1].isalnum() or word[-1]=='_' else "\\B")) 

>>> text = 'This is word of spy++' 
>>> wl = ['spy++','cry','fpp', 'word'] 
>>> for word in wl: 
    match = re.search(match_word(word), text) 
    if match: 
     print(repr(match.group())) 
    else: 
     print("{} did not match".format(word)) 


'spy++' 
cry did not match 
fpp did not match 
'word'

来源

2011-11-28 13:28:59 Duncan

你的代码检测到_spy ++ _ _word在！spy ++'_中，在spy ++的__word！_中，以及在spy +++++ _的_word中检测到_spy ++ _。我不确定这是什么想要的。事实上，他的要求是困惑的。 – eyquem

@eyquem是的要求是混乱。如果他规定了单词和非单词之间界限的确切规则，那么可能会匹配这些规则。 – Duncan

我已经纠正了需要抱歉的困惑。 – Shashi

Sashi，

你的问题是很差，这并不表示你到底想要什么。然后人们试图从你的代码内容中扣除你想要的东西，这会导致混淆。

我想你想找到的单词出现次数在列表WL，当他们在一个字符串纯属孤立的，那就是周围没有每次出现的任何非空白的话。

如果是的话，我建议正则表达式的模式下面的代码：

import re 

ss = 'spy++ This !spy++ is spy++! word of spy++' 
print ss 
print [mat.start() for mat in re.finditer('spy',ss)] 
print 


base = ('(?:(?<=[ \f\n\r\t\v])|(?<=\A))' 
     '%s' 
     '(?=[ \f\n\r\t\v]|\Z)') 

for x in ['spy++','cry','fpp']: 
    print x,[mat.start() for mat in re.finditer(base % re.escape(x),ss)]

结果

spy++ This !spy++ is spy++! word of spy++ 
[0, 12, 21, 36] 

spy++ [0, 36] 
cry [] 
fpp []

来源

2011-11-29 12:17:48 eyquem

处理“++”在Python正则表达式标志

回答

相关问题