2013-10-02 36 views
-4

我有许多需要提取和格式化数据的日志文件。其中一些日志文件非常大,超过10,000行。Python - 格式化文本文件中的特定数据

任何人都可以推荐一个代码示例来帮助我读取文本文件,删除不需要的行,然后将其余行编辑为特定格式。我一直没有找到任何以前的线程,我有什么后。

我需要编辑数据的下面是一个例子:

136: add student 50000000 35011/Y01T :Unknown id in field 3 - ignoring line 

137: add student 50000000 5031/Y01S :Unknown id in field 3 - ignoring line 

138: add student 50000000 881/Y01S :Unknown course idnumber in field 4 - ignoring line 

139: add student 50000000 5732/Y01S :Unknown id in field 3 - ignoring line 

134: add student 50000000 W250/Y02S :OK 

135: add student 50000000 35033/Y01T :OK 

我需要搜索的文件并删除后缀有任何行:OK。 然后,我需要到一个CSV格式,如编辑,其余行:

add,student,50000000,1234/abcd 

任何提示,代码段等将有很​​大的帮助,我会非常感激。我会问,但我没有时间自我教python文件访问/字符串格式。所以,请允许我事先不尝试它之前问

回答

0

道歉这可能是一个解决办法:

import sys 

if len(sys.argv) != 2: 
    print 'Add an input file as parameter' 
    sys.exit(1) 

print 'opening file: %s' % sys.argv[1] 

with open(sys.argv[1]) as input, open('output', 'w+') as output: 
    for line in input: 
     if line is not None: 
      if line == '\n': 
       pass 
      elif 'OK' in line: 
       pass 
      else: 
       new_line = line.split(' ', 7) 
       output.write('%s,%s,%s,%s/%s\n' % (new_line[1], new_line[2], new_line[3], new_line[4], new_line[6])) 
       # just for checking purposes let's print the lines 
       print '%s,%s,%s,%s/%s' % (new_line[1], new_line[2], new_line[3], new_line[4], new_line[6]) 

当心输出文件名!

+0

我会给它一个去玩的代码。 非常感谢您的回复。 – Russ

0

您可以更改正则表达式来满足您的需求,如果他们有所不同,如果你需要其他的分隔符,你也可以修改csv.writer的参数:

import re, csv 

regex = re.compile(r"(\d+)\s*:\s*(\w+)\s+(\w+)\s+(\w+)\s+([\w/ ]+?):\s*(.+)") 
with open("out.csv", "w") as outfile: 
    writer = csv.writer(outfile, delimiter=',', quotechar='"') 
    with open("log.txt") as f: 
     for line in f: 
      m = regex.match(line) 
      if m and m.group(6) != "OK": 
       writer.writerow(m.groups()[1:-1]) 
+0

你好,谢谢你的回复。这些对我非常有帮助,我学得很快。非常感谢帮助。 – Russ

0

感谢您的帮助球员。作为一个新手,我结束的代码不够优雅,但它仍然可以完成这项工作:)。

#open the file and create the CSV after filtering the input file. 
def openFile(filename, keyword): #defines the function to open the file. User to pass two variables. 

    list = [] 
    string = '' 

    f = open(filename, 'r') #opens the file as a read and places it into the variable 'f'. 
    for line in f: #for each line in 'f'. 
     if keyword in line: #check to see if the keyword is in the line. 
      list.append(line) #add the line to the list. 

    print(list) #test. 

    for each in list: #filter and clean the info, format the info into a CSV format. 
     choppy = each.partition(': ') #split to remove the prefix. 
     chunk = choppy[2] #take the good string. 
     choppy = chunk.partition(' :') #split to remove the suffix. 
     chunk = choppy[0] #take the good string. 
     strsplit = chunk.split(' ') #split the string by spaces ' '. 
     line = strsplit[0] + ',' + strsplit[1] + ',' + strsplit[2] + ',' + strsplit[3] + ' ' + strsplit[4] + ' ' + strsplit[5] + '\n' #concatenate the strings. 

     string = string + line #concatenate each line to create a single string. 

    print(string) #test. 

    f = open(keyword + '.csv', 'w') #open a file to write. 
    f.write(string) #write the string to the file. 
    f.close() #close the file. 



openFile('russtest.txt', 'cat') 
openFile('CRON ENROL LOG 200913.txt', 'field 4') 

谢谢:)。