在扩大英语收缩的基础上，最常见的收缩的字典

2017-09-03 49 views -1 likes

-1

我想使用Python签约的话，但现在面临的错误来代替。在扩大英语收缩的基础上，最常见的收缩的字典

import re 
tweet = "I luv my <3 iphone & you're awsm apple. DisplayIsAwesome, sooo happppppy http://www.apple.com" 
contractions_dict = {"ain't": "am not", 
        "aren't": "are not", 
        "can't": "cannot", 
        "you're": "you are"}  

contractions_re = re.compile('(%s)' '|'.join(contractions_dict.keys())) 

def expand_contractions(s, contractions_dict=contractions_dict): 
    def replace(match): 
     return contractions_dict[match.group(0)] 

    return contractions_re.sub(replace, s) 

expand_contractions(tweet)

我试过在“你是”中添加“/”，无济于事。

的输出是通过通过原来的鸣叫。我哪里错了？

谢谢

来源

2017-09-03 Rushat Rai

回答

这里有一个线索：

>>> print('(%s)' '|'.join(contractions_dict.keys())) 
you're(%s)|aren't(%s)|ain't(%s)|can't

由于%s有一个正则表达式没有特别的意义，它只会匹配本身。但是输入中没有百分号，所以匹配失败。

我怀疑你正在寻找类似

>>> print('|'.join('(%s)' % k for k in contractions_dict.keys())) 
(you're)|(aren't)|(ain't)|(can't)

或许

>>> print('(%s)' % '|'.join(contractions_dict.keys())) 
(you're|aren't|ain't|can't)

但因为你正在使用match.group(0)（即整个匹配的字符串）的捕获是不相关的，并且有没有必要在交替中加入单词。所以更简单的解决方案是好的：

>>> contractions_re = re.compile('|'.join(contractions_dict.keys())) 
>>> expand_contractions(tweet) 
'I luv my <3 iphone & you are awsm apple. DisplayIsAwesome, sooo happppppy \xf0\x9f\x99\x82 http://www.apple.com'

来源

2017-09-04 03:12:43 rici

很愚蠢的错误，是吧？：）谢谢！ –

在扩大英语收缩的基础上，最常见的收缩的字典

回答

相关问题