2016-10-30 26 views
1

我想解析一个文本文件来做一些python统计。为此,我想用标记替换一些标点符号。这种令牌的一个例子是终止句子的所有标点符号(.!?成为<EndS>)。我设法使用正则表达式来做到这一点。现在我试图解析报价。因此,我认为,我需要一种方法来区分开盘报价和收盘报价。我正在逐行阅读输入文件,并且我不能保证报价将被平衡。解析替换引号

作为例子:

"Death to the traitors!" cried the exasperated burghers. 
"Go along with you," growled the officer, "you always cry the same thing over again. It is very tiresome." 

应该成为这样的:

[Open] Death to the traitors! [Close] cried the exasperated burghers. 
[Open] Go along with you, [Close] growled the officer, [Open] you always cry the same thing over again. It is very tiresome. [Close] 

是否有可能做到这一点使用正则表达式?有没有更容易/更好的方法来做到这一点?

回答

5

您可以使用方法(模块重新):

import re 

def replace_dbquote(render): 
    return '[OPEN]' + render.group(0).replace('"', '') + '[CLOSE]' 

string = '"Death to the traitors!" cried the exasperated burghers. "Go along with you", growled the officer.' 
parser = re.sub('"[^"]*"', replace_dbquote, string) 

print(parser) 

https://docs.python.org/3.5/library/re.html#re.sub