对于下列二进制文件(可从以下地址下载,here):的Python - 格式化输出
*NEWRECORD
RECTYPE = D
MH = Calcimycin
AQ = AA AD AE AG AI AN BI BL CF CH CL CS CT EC HI IM IP ME PD PK PO RE SD ST TO TU UR
ENTRY = A-23187|T109|T195|LAB|NRW|NLM (1991)|900308|abbcdef
ENTRY = A23187|T109|T195|LAB|NRW|UNK (19XX)|741111|abbcdef
ENTRY = Antibiotic A23187|T109|T195|NON|NRW|NLM (1991)|900308|abbcdef
ENTRY = A 23187
ENTRY = A23187, Antibiotic
MN = D03.633.100.221.173
PA = Anti-Bacterial Agents
PA = Calcium Ionophores
MH_TH = FDA SRS (2014)
MH_TH = NLM (1975)
ST = T109
ST = T195
N1 = 4-Benzoxazolecarboxylic acid, 5-(methylamino)-2-((3,9,11-trimethyl-8-(1-methyl-2-oxo-2-(1H-pyrrol-2-yl)ethyl)-1,7-dioxaspiro(5.5)undec-2-yl)methyl)-, (6S-(6alpha(2S*,3S*),8beta(R*),9beta,11alpha))-
RN = 37H9VM9WZL
RR = 52665-69-7 (Calcimycin)
PI = Antibiotics (1973-1974)
PI = Carboxylic Acids (1973-1974)
MS = An ionophorous, polyether antibiotic from Streptomyces chartreusensis. It binds and transports CALCIUM and other divalent cations across membranes and uncouples oxidative phosphorylation while inhibiting ATPase of rat liver mitochondria. The substance is used mostly as a biochemical tool to study the role of divalent cations in various biological systems.
OL = use CALCIMYCIN to search A 23187 1975-90
PM = 91; was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)
HN = 91(75); was A 23187 1975-90 (see under ANTIBIOTICS 1975-83)
MR = 20160527
DA = 19741119
DC = 1
DX = 19840101
UI = D000001
*NEWRECORD
RECTYPE = D
MH = Temefos
AQ = AA AD AE AG AI AN BL CF CH CL CS CT EC HI IM IP ME PD PK RE SD ST TO TU UR
ENTRY = Abate|T109|T131|TRD|NRW|NLM (1996)|941114|abbcdef
ENTRY = Difos|T109|T131|TRD|NRW|UNK (19XX)|861007|abbcdef
ENTRY = Temephos|T109|T131|TRD|EQV|NLM (1996)|941201|abbcdef
MN = D02.705.400.625.800
MN = D02.705.539.345.800
MN = D02.886.300.692.800
PA = Insecticides
MH_TH = FDA SRS (2014)
MH_TH = INN (19XX)
MH_TH = USAN (1974)
ST = T109
ST = T131
N1 = Phosphorothioic acid, O,O'-(thiodi-4,1-phenylene) O,O,O',O'-tetramethyl ester
RN = ONP3ME32DL
RR = 3383-96-8 (Temefos)
AN = for use to kill or control insects, use no qualifiers on the insecticide or the insect; appropriate qualifiers may be used when other aspects of the insecticide are discussed such as the effect on a physiologic process or behavioral aspect of the insect; for poisoning, coordinate with ORGANOPHOSPHATE POISONING
PI = Insecticides (1966-1971)
MS = An organothiophosphate insecticide.
PM = 96; was ABATE 1972-95 (see under INSECTICIDES, ORGANOTHIOPHOSPHATE 1972-90)
HN = 96; was ABATE 1972-95 (see under INSECTICIDES, ORGANOTHIOPHOSPHATE 1972-90)
MR = 20130708
DA = 19990101
DC = 1
DX = 19910101
UI = D000002
我有以下Python代码:
import re
terms = {}
numbers = {}
meshFile = 'd2017.bin'
with open(meshFile, mode='rb') as file:
mesh = file.readlines()
outputFile = open('mesh.txt', 'w')
for line in mesh:
meshTerm = re.search(b'MH = (.+)$', line)
if meshTerm:
term = meshTerm.group(1)
meshNumber = re.search(b'MN = (.+)$', line)
if meshNumber:
number = meshNumber.group(1)
numbers[str(number)] = term
if term in terms:
terms[term] = terms[term] + ' ' + str(number)
else:
terms[term] = str(number)
cumlist = []
keylist = terms.keys()
for key in keylist:
#print('THE ORIGIN FOR ', key, file=outputFile)
item_list = terms[key].split(" ")
for phrase in item_list:
cumlist.append(phrase)
print(cumlist)
for item in cumlist:
print(numbers[str(item)], '\n', item, file=outputFile)
的输出如下:
b'Calcimycin\r'
b'D03.633.100.221.173\r'
b'Temefos\r'
b'D02.705.400.625.800\r'
b'Temefos\r'
b'D02.705.539.345.800\r'
b'Temefos\r'
b'D02.886.300.692.800\r'
如何重新格式化输出,如下所示:
Calcimycin
D03.633.100.221.173
Temefos
D02.705.400.625.800
D02.705.539.345.800
D02.886.300.692.800
谢谢。
您是否有使用二进制字符串的原因? – TidB
str.decode('utf-8')。strip() – RaminNietzsche
@TidB如果您在这里指的是正则表达式,并使用“b”而不是“r”,这是因为我正在读取一个二进制文件,是一个MeSH文件。当我使用“r”时,正则表达式不起作用。我有回答你的问题吗? – Simplicity