2011-08-24 54 views
0

我是python的新手,我试图删除文本文件中的行,如果我找到单词“Lett”。在线。这里是我试图分析文本文件的样本:Python中的文本文件解析问题

<A>Lamb</A> <W>Let. Moxon</W> 
<A>Lamb</A> <W>Danger Confound. Mor. w. Personal Deformity</W> 
<A>Lamb</A> <W>Gentle Giantess</W> 
<A>Lamb</A> <W>Lett., to Wordsw.</W> 
<A>Lamb</A> <W>Lett., to Procter</W> 
<A>Lamb</A> <W>Let. to Old Gentleman</W> 
<A>Lamb</A> <W>Elia Ser.</W> 
<A>Lamb</A> <W>Let. to T. Manning</W> 

我知道如何打开该文件,但我只是不确定如何找到匹配的文本,然后如何删除线。任何帮助将不胜感激。

回答

4
f = open("myfile.txt", "r") 
for line in f: 
    if not "Lett." in line: print line, 

f.close() 

,或者如果你想要的结果写入文件:

f = open("myfile.txt", "r") 
lines = f.readlines() 
f.close() 
f = open("myfile.txt", "w") 
for line in lines: 
    if not "Lett." in line: f.write(line) 

f.close() 
+1

不要忘记写每个行,当回文件,添加一个换行符。 – PyKing

+1

不,'readlines'将会在每一行提供换行符。 – jtbandes

+0

你说得对。我一定把它和分裂线()混淆了。 – PyKing

1
# Open input text 
text = open('in.txt', 'r') 
# Open a file to output results 
out = open('out.txt', 'w') 

# Go through file line by line 
for line in text.readlines(): 
    if 'Lett.' not in line: ### This is the crucial line. 
     # add line to file if 'Lett.' is not in the line 
     out.write(line) 
# Close the file to save changes 
out.close() 
1

我对这种东西一般流编辑器框架。我将文件加载到内存中,对内存列表中的行应用更改,并在发生更改时写出文件。

我有样板,看起来像这样:

from sed_util import delete_range, insert_range, append_range, replace_range 

def sed(filename): 
    modified = 0 

    # Load file into memory 
    with open(filename) as f: 
     lines = [line.rstrip() for line in f] 

    # magic here... 

    if modified: 
     with open(filename, "w") as f: 
      for line in lines: 
       f.write(line + "\n") 

而在# magic here部分,我有两种:

  1. 修改个别线路,如:

    lines[i] = change_line(lines[i])

  2. 给我的电话用于插入,附加,和更换线,像tilities:

    lines = delete_range(lines, some_range)

后者使用原语这样的:

def delete_range(lines, r): 
    """ 
    >>> a = list(range(10)) 
    >>> b = delete_range(a, (1, 3)) 
    >>> b 
    [0, 4, 5, 6, 7, 8, 9] 
    """ 
    start, end = r 
    assert start <= end 
    return [line for i, line in enumerate(lines) if not (start <= i <= end)] 

def insert_range(lines, line_no, new_lines): 
    """ 
    >>> a = list(range(10)) 
    >>> b = list(range(11, 13)) 
    >>> c = insert_range(a, 3, b) 
    >>> c 
    [0, 1, 2, 11, 12, 3, 4, 5, 6, 7, 8, 9] 
    >>> c = insert_range(a, 0, b) 
    >>> c 
    [11, 12, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 
    >>> c = insert_range(a, 9, b) 
    >>> c 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 9] 
    """ 
    assert 0 <= line_no < len(lines) 
    return lines[0:line_no] + new_lines + lines[line_no:] 

def append_range(lines, line_no, new_lines): 
    """ 
    >>> a = list(range(10)) 
    >>> b = list(range(11, 13)) 
    >>> c = append_range(a, 3, b) 
    >>> c 
    [0, 1, 2, 3, 11, 12, 4, 5, 6, 7, 8, 9] 
    >>> c = append_range(a, 0, b) 
    >>> c 
    [0, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9] 
    >>> c = append_range(a, 9, b) 
    >>> c 
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12] 
    """ 
    assert 0 <= line_no < len(lines) 
    return lines[0:line_no+1] + new_lines + lines[line_no+1:] 

def replace_range(lines, line_nos, new_lines): 
    """ 
    >>> a = list(range(10)) 
    >>> b = list(range(11, 13)) 
    >>> c = replace_range(a, (0, 2), b) 
    >>> c 
    [11, 12, 2, 3, 4, 5, 6, 7, 8, 9] 
    >>> c = replace_range(a, (8, 10), b) 
    >>> c 
    [0, 1, 2, 3, 4, 5, 6, 7, 11, 12] 
    >>> c = replace_range(a, (0, 10), b) 
    >>> c 
    [11, 12] 
    >>> c = replace_range(a, (0, 10), []) 
    >>> c 
    [] 
    >>> c = replace_range(a, (0, 9), []) 
    >>> c 
    [9] 
    """ 
    start, end = line_nos 
    return lines[:start] + new_lines + lines[end:] 

def find_line(lines, regex): 
    for i, line in enumerate(lines): 
     if regex.match(line): 
      return i 

if __name__ == '__main__': 
    import doctest 
    doctest.testmod() 

试验对整数数组工作,为了清楚,但这些转换也适用于字符串数组。

通常,我扫描行的列表以识别要应用的更改,通常使用正则表达式,然后将匹配的数据应用更改。例如,今天,我最终在150个文件中进行了大约2000行更改。

当您需要应用多行模式或附加逻辑来确定更改是否适用时,此功能比sed更好。

0

回报[l对于升的开放式(FNAME),如果“快报”不是1]

0
result = '' 
for line in open('in.txt').readlines(): 
    if 'lett' not in line: 
     result += line 
f = open('out.txt', 'a') 
f.write(result)