2010-07-05 110 views
1

我想在一个目录结构如下读取不同的文件:读取文件中的数据混合字符串和数字在python

# Mj = 1.60  ff = 7580.6 gg = 0.8325 

我想读每个文件和联系号码每一个到向量。 如果我们假设我有3个文件,我将有3个向量Mj的组件,... 如何在Python中执行此操作?

感谢您的帮助。

+0

所以矢量Mj =(1.60,7580,0.8325)?我不太清楚你想要什么,请提供更多细节。 – 2010-07-05 18:30:07

回答

1

我会使用一个正则表达式来走线分开:

import re 
lineRE = re.compile(r''' 
    \#\s* 
    Mj\s*=\s*(?P<Mj>[-+0-9eE.]+)\s* 
    ff\s*=\s*(?P<ff>[-+0-9eE.]+)\s* 
    gg\s*=\s*(?P<gg>[-+0-9eE.]+) 
    ''', re.VERBOSE) 

for filename in filenames: 
    for line in file(filename, 'r'): 
     m = lineRE.match(line) 
     if not m: 
      continue 
     Mj = m.group('Mj') 
     ff = m.group('ff') 
     gg = m.group('gg') 
     # Put them in whatever lists you want here. 
0

这里有一个pyparsing的解决方案,可能会更容易比一个正则表达式解决方案来管理:

text = "# Mj = 1.60  ff = 7580.6 gg = 0.8325 " 

from pyparsing import Word, nums, Literal 

# subexpression for a real number, including conversion to float 
realnum = Word(nums+"-+.E").setParseAction(lambda t:float(t[0])) 

# overall expression for the full line of data 
linepatt = (Literal("#") + "Mj" + "=" + realnum("Mj") + 
      "ff" + "=" + realnum("ff") + 
      "gg" + "=" + realnum("gg")) 

# use '==' to test for matching line pattern 
if text == linepatt: 
    res = linepatt.parseString(text) 

    # dump the matched tokens and all named results 
    print res.dump() 

    # access the Mj data field 
    print res.Mj 

    # use results names with string interpolation to print data fields 
    print "%(Mj)f %(ff)f %(gg)f" % res 

打印:

['#', 'Mj', '=', 1.6000000000000001, 'ff', '=', 7580.6000000000004, 'gg', '=', 0.83250000000000002] 
- Mj: 1.6 
- ff: 7580.6 
- gg: 0.8325 
1.6 
1.600000 7580.600000 0.832500 
+0

有趣。现在尝试在没有'from pyparsing import *'的情况下编写它,[非常不鼓励](http://docs.python.org/howto/doanddont.html#from-module-import) – 2010-07-06 12:02:39

+1

我也劝阻它(http: //my.safaribooksonline.com/9780596514235/basic_form_of_a_pyparsing_program),但我想我匆匆赶过这些简单的解析器。 – PaulMcG 2010-07-06 13:10:29