2016-10-02 106 views
0

我有一个日志文件,如下图所示:Python的 - 从日志文件中提取字符串,并写入到另一个文件

sw2 switch_has sw2_p3. 
sw1 transmits sw2_p2 
/* BUG: axiom too complex: SubClassOf(ObjectOneOf([NamedIndividual(#t_air_sens2)]),DataHasValue(DataProperty(#qos_type),^^(latency,http://www.xcx.org/1900/02/22-rdf-syntax-ns#PlainLiteral))) */ 
/* BUG: axiom too complex: SubClassOf(ObjectOneOf([NamedIndividual(#t_air_sens2)]),DataHasValue(DataProperty(#topic_type),^^(periodic,http://www.xcx.org/1901/11/22-rdf-syntax-ns#PlainLiteral))) */ 
... 

什么我感兴趣的,是从/* BUG...线和提取特定的词它们写入到单独的文件,像下面的东西:

awk -F'#|\\^\\^\\(' '{for (i=2; i<NF; i++) printf "%s%s", gensub(/[^[:alnum:]_].*/,"",1,$i), (i<(NF-1) ? OFS : ORS) }' output.txt > ./LogErrors/Properties.txt 
0:

t_air_sens2 qos_type latency 
t_air_sens2 topic_type periodic 
... 

我可以用awk的帮助和正则表达式的壳像下面这样做

如何使用Python提取它们? (我应该再次使用正则表达式,还是..?)

回答

1

你当然可以使用正则表达式。我会一行一行阅读,从'/* BUG:'开始抓起行,然后根据需要解析这些行。

import re 

target = r'/* BUG:' 
bugs = [] 
with open('logfile.txt', 'r') as infile, open('output.txt', 'w') as outfile: 
    # loop through logfile 
    for line in infile: 
     if line.startswith(target): 
      # add line to bug list and strip newlines 
      bugs.append(line.strip()) 
      # or just do regex parsing here 
      # create match pattern groups with parentheses, escape literal parentheses with '\' 
      match = re.search(r'NamedIndividual\(([\w#]+)\)]\),DataHasValue\(DataProperty\(([\w#]+)\),\^\^\(([\w#]+),', line) 
      # if matches are found 
      if match: 
       # loop through match groups, write to output 
       for group in match.groups(): 
        outfile.write('{} '.format(group)) 
       outfile.write('\n') 

Python有内置的一个非常强大的正则表达式模块:re module

你可以search for a given pattern, then print out the matched groups as needed

注意:raw stringsr'xxxx')可让您使用非转义字符。

相关问题