替换字符串的所有出现在Python

一个唯一的ID，我对数据集，我要预处理工作（由于其起点和终点指标）。我想用它们的唯一ID替换所有的事件（由开始和结束索引给出）。替换字符串的所有出现在Python

给定文本的像的字符串：

s = "The hypotensive effect of 100 mg/kg alpha-methyldopa was also partially reversed by naloxone. Naloxone alone did not affect either blood pressure or heart rate. In brain membranes from spontaneously hypertensive rats clonidine, 10(-8) to 10(-5) M, did not influence stereoselective binding of [3H]-naloxone (8 nM), and naloxone, 10(-8) to 10(-4) M, did not influence naloxone-suppressible binding of [3H]-dihydroergocryptine (1 nM)."

和像字典的列表：

[

'D006973'：[{ '长度'： '12'， 'offset'：'199'， 'text'：['hypertensive']， 'type'：'Disease'}]，

'D008750'：[{ '长度'： '16'， '偏移'： '36'， '文本'：[ '的α-甲基多巴']， '类型'： '化学'}]，

'D007022'：[{ '长度'： '11'， '偏移'： '4'， '文本'：[ '低血压']， '类型'： '病'}]，

'D009270'：[{'length'：'8'， 'offset'：'84'， 'text'：['naloxone']， 'type'：'Chemical'}， {'length' ： '8'， '偏移'： '94'， '文本' ：['Naloxone']， 'type'：'Chemical'}， {'length'：'13'， 'offset'：'293'， 'text'：[“[3H] -naloxone”] ， “类型”：“化学”}]

]

我想以取代各自的ID的偏移给出的所有事件。因此，对于上一个词典，我希望列表中的所有值都由'D009270'取代。

实施例1：第一词典与键“D006973”，我想替换“高血压”，这是存在于索引199和是长度为12的，与“D006973”。

实施例2：用于与键 'D009270'，我想替换从索引子（由元组给出）最后字典

[(84, 92), (94, 102), (293, 306)]

在最后一句，纳洛酮存在与“纳洛酮可抑制”，但我并不想取代它。所以我不能简单地使用str.replace()。
我取代串从起始索引终止索引：用其唯一的ID（例如199至211，用于“高血压”）。但是这扰乱了其他“尚未被替代”实体的抵消。我可以用填充时替换的文本（“D006973”）为大于字符串（“高血压”）更小。但是当要修复的文本的大小更大时它会失败。

来源

2017-06-05 NormalOne

使用're'，python的正则表达式模块。文档[这里]（https://docs.python.org/2/howto/regex.html）。 –

@Shiva我不太明白我们在这种情况下如何使用正则表达式。你能解释它是如何工作的吗？ – NormalOne

对于第一个例子，使用're.sub（r'（？<= \ b | ^）nalaxone（？= \ b | $）'，'D006973'，your_string）'行。 –

您可以使用字符串格式化一个占位符：

from operator import itemgetter 

s = "The hypotensive effect of 100 mg/kg alpha-methyldopa was also partially reversed by naloxone. Naloxone alone did not affect either blood pressure or heart rate. In brain membranes from spontaneously hypertensive rats clonidine, 10(-8) to 10(-5) M, did not influence stereoselective binding of [3H]-naloxone (8 nM), and naloxone, 10(-8) to 10(-4) M, did not influence naloxone-suppressible binding of [3H]-dihydroergocryptine (1 nM)." 

dictionary={ 
'D006973': [{'length': '12', 'offset': '199', 'text': ['hypertensive'], 'type': 'Disease'}], 
'D008750': [{'length': '16', 'offset': '36', 'text': ['alpha-methyldopa'], 'type': 'Chemical'}], 
'D007022': [{'length': '11', 'offset': '4', 'text': ['hypotensive'], 'type': 'Disease'}], 
'D009270': [{'length': '8', 'offset': '84', 'text': ['naloxone'], 'type': 'Chemical'}, {'length': '8', 'offset': '94', 'text': ['Naloxone'], 'type': 'Chemical'}, {'length': '13', 'offset': '293', 'text': ["[3H]-naloxone"], 'type': 'Chemical'}] 
} 

index_list=[] 
for key in dictionary: 
    for dic in dictionary[key]: 
     o=int(dic['offset']) 
     index_tuple=o , o+int(dic['length']),key 
     index_list.append(index_tuple) 

index_list.sort(key=itemgetter(0)) 
format_list=[] 
lt=list(s) 
for i,j in enumerate(index_list): 
    si=j[0] 
    ei=j[1] 
    lt[si:ei]=list("{}") + ["@"]*((ei-si)-2) 
    format_list.append(j[2]) 

text = "".join(lt) 
text = text.replace("@","") 
text = text.format(*format_list)

结果：'The D007022 effect of 100 mg/kg D008750 was also partially reversed by D009270. D009270 alone did not affect either blood pressure or heart rate. In brain membranes from spontaneously D006973 rats clonidine, 10(-8) to 10(-5) M, did not influence stereoselective binding of D009270 (8 nM), and naloxone, 10(-8) to 10(-4) M, did not influence naloxone-suppressible binding of [3H]-dihydroergocryptine (1 nM).'

来源

2017-06-05 17:31:00

替换字符串的所有出现在Python

回答

相关问题