2015-02-11 87 views
1

我的输入是这样的:的Python:抓住整个字符串作为一个元素

MSG1   .STRINGZ “This is my sample string : " 
MEMORYSPACE .BLKW  9 
NEWLINE  .FILL  #10 
NEG48   .FILl  #-48 

     .END 

现在我有,通过字将每个行我输入文件中像这样的代码:

['MSG1', '.STRINGZ', '"This', 'is', 'a' , 'sample' , 'string','"'] 
['MEMORYSPACE', '.BLKW', '9'] 
['NEWLINE', '.FILL', '#10'] 
['NEG48', '.FILl', '#-48'] 
[] 
['.END'] 

在输入文件中,在我的第一行我有字符串,我希望它把整个字符串当作一个元素,这样我就可以在我的代码中计算它的长度。有没有办法做到这一点?这里是我的代码:

f = open ('testLC31.txt', 'r') 
line_count = 0 

to_ignore = ["AND", "ADD", "LEA", "PUTS", "JSR", "LD", "JSRR" , "NOT", "LDI" , 
      "LDR", "ST", "STI", "STR", "BR" , "JMP", "TRAP" , "JMP", "RTI" , 
      "BR", "ST", "STI" , "STR" , "BRz", "BRn" , "HALT"] 

label = [] 
instructions = [] 

for line in f: 
    elem = line.split() if line.split() else [''] 
    if len(elem) > 1 and elem[0] not in to_ignore: 
     label.append(elem[0]) 
     instructions.append(elem[1]) 
     line_count += 1 
    elif elem[0] in to_ignore: 
     line_count += 1 
+0

是分隔符制表,空格运行或组合方式? – 2015-02-11 03:00:24

回答

0

这可以通过假设.STRINGZ在表示字符串时总是在一行上。

结果

“这是我的样本字符串:” LEN(strinz_):32

text_ = """ 
MSG1   .STRINGZ "This is my sample string : " 
MEMORYSPACE .BLKW  9 
NEWLINE  .FILL  #10 
NEG48   .FILl  #-48 

     .END 
""" 

STRINGZ_ = '.STRINGZ' 
line_count_ = 0 

lines_ = text_.split('\n') 

to_ignore = ["AND", "ADD", "LEA", "PUTS", "JSR", "LD", "JSRR" , "NOT", "LDI" , 
      "LDR", "ST", "STI", "STR", "BR" , "JMP", "TRAP" , "JMP", "RTI" , 
      "BR", "ST", "STI" , "STR" , "BRz", "BRn" , "HALT"] 

label = [] 
instructions = [] 

for line in lines_: 
    if STRINGZ_ in line: 
     stringz_ = line.split(STRINGZ_)[1] 
     print stringz_ 
     print 'len(stringz_): ' + str(len(stringz_)) 
    elem = line.split() if line.split() else [''] 
    if len(elem) > 1 and elem[0] not in to_ignore: 
     label.append(elem[0]) 
     instructions.append(elem[1]) 
     line_count_ += 1 
    elif elem[0] in to_ignore: 
     line_count_ += 1 
0
with open("filename") as f: 
    rd = f.readlines() 
    print (rd[0].split("\n")[0].split()) 

拆分\n和空间。打印每个列表的第一个元素。 readlines()将返回一个列表,操纵它更容易。另外with open()方法更好。

1

str.split方法有一个可选参数maxsplit,这限制在结果列表中元素的个数:

>>> 'MSG1   .STRINGZ “This is my sample string : "'.split(None, 2) 
['MSG1', '.STRINGZ', '“This is my sample string : "'] 

如果你想要的东西比得到的前两个单词,而保留其余较复杂的完好,shlex.split可能适合你。它使用类似shell的语法来分割字符串的各个部分,并将引号中的字符串视为单个元素。您可以通过创建shlex对象实例并更改其属性来准确设置格式。详情请参阅文档。

>>> shlex.split('MSG1   .STRINGZ "This is my sample string : "') 
['MSG1', '.STRINGZ', 'This is my sample string : '] 
>>> shlex.split('MSG1   .STRINGZ "This is my sample string : "', posix=False) 
['MSG1', '.STRINGZ', '"This is my sample string : "'] 

如果这还不够,以及,在选择就是写一个完整的解析器的格式,例如使用pyparsing库。

1

您可以尝试手动回来,像这样结合这些字符串的这种粗略的方法:

tags = ['MSG1', '.STRINGZ', '"This', 'is', 'a' , 'sample' , 'string','"'] 
FirstOccurance = 0 
longtag = "" 
for tag in tags: 
    if FirstOccurance == 1: 
     if tag == "\"": 
      longtag += tag 
     else: 
      longtag += " "+tag 
    if ("\"" in tag) and (FirstOccurance == 0): 
     longtag += tag 
     FirstOccurance = 1 
    elif ("\"" in tag) and (FirstOccurance == 1): 
     FirstOccurance = 0 

print longtag 

希望这有助于。

0

一个简单的汇编程序?这是一个粗略的通使用pyparsing:

code = """ 
MSG1   .STRINGZ "This is my sample string : " 
MEMORYSPACE .BLKW  9 
NEWLINE  .FILL  #10 
NEG48   .FILL  #-48 

     .END""" 

from pyparsing import Word, alphas, alphanums, Regex, Combine, quotedString, Optional 

identifier = Word(alphas, alphanums+'_') 
command = Word('.', alphanums) 

integer = Regex(r'[+-]?\d+') 
byte_literal = Combine('#' + integer) 
command_arg = quotedString | integer | byte_literal 
codeline = Optional(identifier)("label") + command("instruction") + Optional(command_arg("arg")) 

for line in code.splitlines(): 
    line = line.strip() 
    if not line: 
     continue 

    print line 
    assemline = codeline.parseString(line) 
    print assemline.dump() 
    print 

打印

MSG1   .STRINGZ "This is my sample string : " 
['MSG1', '.STRINGZ', '"This is my sample string : "'] 
- arg: "This is my sample string : " 
- instruction: .STRINGZ 
- label: MSG1 

MEMORYSPACE .BLKW  9 
['MEMORYSPACE', '.BLKW', '9'] 
- arg: 9 
- instruction: .BLKW 
- label: MEMORYSPACE 

NEWLINE  .FILL  #10 
['NEWLINE', '.FILL', '#10'] 
- arg: #10 
- instruction: .FILL 
- label: NEWLINE 

NEG48   .FILL  #-48 
['NEG48', '.FILL', '#-48'] 
- arg: #-48 
- instruction: .FILL 
- label: NEG48 

.END 
['.END'] 
- instruction: .END 
相关问题