提取值使用python

括在从文本文件中的大括号我在文本文件中的行如下所示：提取值使用python

0044xx AAA，BBB < +> 01/01/0017：53 < &> {3.01} {00001 } {XXX YYY DIFF} {（4.0-10.5）} {} 7.2

等

我试图像提取值：

AAA is 0044xx aaa, bbb 

BBB is 01/01/0017:53 

CCC is 3.01 

DDD is 00001 

EEE is xxx yyy 

FFF is (4.0-10.5) 

HHH is 7.2

我不能从CCC中提取大括号内的HHH值。

我的脚本是这样的：

import sys 

import re 

import csv 

def separateCodes(code): 
    values = re.compile('.*?\{(.*?)\}.*?') 
    field=values.findall(code)  

    for i in range(len(field)): 
     print field[i] 
    print"-------------------------"   

def handleError(self, record): 
    raise  
with open('/xxx.TXT') as ABCfp:  
    PP=ABCfp.read() 

PPwithNOrn=PP.replace('*\r','').replace('\n', '') 
PPByMsg=PPwithNOrn.split('<~>') 
print len(PPByMsg) 

for i in range(len(PPByMsg)): 

    AAA="" 
    BBB="" 
    CCC="" 
    DDD="" 
    EEE="" 
    FFF="" 
    GGG="" 
    HHH="" 

    print i,"=>",PPByMsg[i] 
    if PPByMsg[i].find("<L>")!=-1: 
     print "-----------------------" 
     # AAA found 
     AAA=PPByMsg[i].split('<L> <+>')[0] 
    # BBB found 
    BBB=PPByMsg[i].split('<L> <+>')[1].split('<&>')[0] 
     # REST Found 
    rest=separateCodes(PPByMsg[i].split('<L> <+>')[1].split('<&>')[1])

由于我是新手到Python无法继续前进。请提出一种方法来提取这些值。

来源

2013-12-20 Bullu

欢迎堆栈溢出。请[格式化代码]（http://stackoverflow.com/editing-help），以便每个人都可以阅读。 – SuperSaiyan

“EEE”在你想要提取值的方式中是否正确？ – Jerry

这个怎么样，而不是：

a,b,c = re.split('<[+&]>', i) 
bits = re.split('{(.*?)}', c)[1:-1]

bits将有你的字符串的最后一部分的令牌：

>>> bits 
[' 3.01', '', '00001 ', '', 'xxx yyy DIFF', '', '(4.0-10.5)', '', '7.2'] 
>>> a 
'0044xx aaa, bbb ' 
>>> b 
' 01/01/0017:53 '

来源

2013-12-20 07:31:53

使用像这样休息= separateCodes（PatientETLByMsg [I] .split（” <+> '）[1] .split（' <&> '）[1]） \t ORDER = rest.split（'{（。*？） }'，c）[1：-1] \t print ORDER – Bullu

ORDER = rest.split（'{（。*？）}'，c）[1：-1] AttributeError：'NoneType'object has无属性 '分裂' – Bullu

当作为MRN，DATETIME其余= re.split（ '<[+&]>'，I） \t位= re.split使用（ '{（。*？）}'，STR（休息））[1 ：-1]获取错误返回_compile（pattern，0）.split（string，maxsplit） TypeError：期望的字符串或缓冲区 – Bullu

你可以做一个正则表达式的整个操作：

>>> t = '0044xx aaa, bbb <+> 01/01/0017:53 <&> { 3.01}{00001 }{xxx yyy DIFF}{(4.0-10.5)}{7.2}' 
>>> re.search(r'(.*?)\s<\+>\s(.*?)\s<&>\s{(.*?)\}\{(.*?)\}\{(.*?) DIFF\}\{(.*?)\}\{(.*?)\}', t).groups() 
('0044xx aaa, bbb', '01/01/0017:53', ' 3.01', '00001 ', 'xxx yyy', '(4.0-10.5)', '7.2')

您可以使用(?P<name>.*?)来扩展正则表达式而不是(.*?)给予命名结果：

>>> re.search(r'(?P<a>.*?)\s<\+>\s(?P<b>.*?)\s<&>\s{(?P<c>.*?)\}\{(?P<d>.*?)\}\{(?P<e>.*?) DIFF\}\{(?P<f>.*?)\}\{(?P<g>.*?)\}', t).groupdict() 
{'a': '0044xx aaa, bbb', 'c': ' 3.01', 'b': '01/01/0017:53', 'e': 'xxx yyy', 'd': '00001 ', 'g': '7.2', 'f': '(4.0-10.5)'}

或者，使用zip或元组的分配，如：

>>> results = re.search(...).groups() 
>>> resultdict = zip('abcdefg', results) 
>>> a, b, c, d, e, f, g = results

来源

2013-12-20 07:33:25 aquavitae

获取错误TypeError：预期的字符串或缓冲区 – Bullu

它适用于我（Python 2.7）。如果您提供更多信息，只是“出现错误”，我可能会提供帮助。你在哪里得到错误，你运行的是什么版本的Python？错误是我发布的代码，还是在您自己的代码中使用正则表达式时得到它？ – aquavitae

我正在使用python 2.6.6。我把它作为结果= re.search（rest）.groups（） TypeError：search（）至少需要2个参数（1给出） – Bullu

我已经完成了我的以下要求：

rest=separateCodes(PatientETLByMsg[i].split('<L> <+>')[1].split('<&>')[1]) 

CCC=PPByMsg[i].split('{')[1].split('}')[0] 
DDD=PPByMsg[i].split('}{')[1] 
EEE=PPByMsg[i].split('}{')[2] 
FFF=PPByMsg[i].split('}{')[3] 
GGG=PPByMsg[i].split('}{')[4] 
HHH=PPByMsg[i].split('}{')[5] 
KKK=PPByMsg[i].split('}{')[6].split('}')[0]

来源

2013-12-31 09:44:45 Bullu

提取值使用python

回答

相关问题