修改每一行的文本文件在Python

我有一个大文件，像下面的例子：修改每一行的文本文件在Python

1 10161 10166 3 
1 10166 10172 2 
1 10172 10182 1 
1 10183 10192 1 
1 10193 10199 1 
1 10212 10248 1 
1 10260 10296 1 
1 11169 11205 1 
1 11336 11372 1 
2 11564 11586 2 
2 11586 11587 3 
2 11587 11600 4 
3 11600 11622 2

我想在每行开头加上“CHR”，例如：

chr1 10161 10166 3 
chr1 10166 10172 2 
chr1 10172 10182 1 
chr1 10183 10192 1 
chr1 10193 10199 1 
chr1 10212 10248 1 
chr1 10260 10296 1 
chr1 11169 11205 1 
chr1 11336 11372 1 
chr2 11564 11586 2 
chr2 11586 11587 3 
chr2 11587 11600 4 
chr3 11600 11622 2

我尝试在Python下面的代码：

file = open("myfile.bg", "r") 
    for line in file: 
     newline = "chr" + line 
    out = open("outfile.bg", "w") 
    for new in newline: 
     out.write("n"+new)

但没有返回我想要的东西。你知道如何解决这个问题的代码吗？

来源

2017-10-04 user7249622

1）你必须连接上换行符的字符串（如+ =）我的版本 2）请邮寄的结果，或者任何 – Thecave3

错误现在不需要了，因为问题已经得到解答，但如果您可以包含您所看到的输出，这通常会很有帮助。 – ryachza

的问题是你迭代的输入和再设定相同的变量（newline）为每一行，然后打开文件的输出值并迭代newline它是一个字符串，所以new将在该字符串中的每个字符。

我觉得这样的事情应该是你在找什么：

with open('myfile.bg','rb') as file: 
    with open('outfile.bg','wb') as out: 
    for line in file: 
     out.write('chr' + line)

当遍历文件，line应该已经包含了结尾的新行。

with语句将在块结束时自动清理文件句柄。

来源

2017-10-04 18:09:44 ryachza

@thebjorn什么不行？当我测试它时，它看起来很完美。你看到了什么输出？ – ryachza

与您的代码的问题是，你遍历输入文件，而不与数据做任何你读到：

file = open("myfile.bg", "r") 
for line in file: 
    newline = "chr" + line

最后一行分配在myfile.bg到newline变量（一个字符串的每一行，用'chr'前置），每行覆盖前一个结果。

然后你遍历字符串中newline（这将是在输入文件的最后一行，与'chr'预谋）：

out = open("outfile.bg", "w") 
for new in newline:  # <== this iterates over a string, so `new` will be individual characters 
    out.write("n"+new) # this only writes 'n' before each character in newline

如果你只是在做这一次，例如在外壳，你可以使用一个班轮：

open('outfile.bg', 'w').writelines(['chr' + line for line in open('myfile.bg').readlines()])

更正确的（尤其是在一个程序中，在那里你会在乎打开的文件句柄等）将是：

with open('myfile.bg') as infp: 
    lines = infp.readlines() 
with open('outfile.bg', 'w') as outfp: 
    outfp.writelines(['chr' + line for line in lines])

如果文件是真的大（接近可用内存的大小），你需要逐步处理它：

with open('myfile.bg') as infp: 
    with open('outfile.bg', 'w') as outfp: 
     for line in infp: 
      outfp.write('chr' + line)

（这比第t慢得多窝版本虽然..）

来源

2017-10-04 18:11:26 thebjorn

只有我在这里看到的是内存使用情况，如果文件很大。 – ryachza

临时文件试图解决什么问题？我唯一的想法是，如果有一个敌对的读者可以在写作时打开它，但是由于缓冲，这将是任何文件大小的问题。 – ryachza

你不能打开同一个文件进行阅读和写作，特别是在这里，因为你正在写更多的数据而不是你正在阅读的内容，你最终会读取新数据而不是旧数据。直到你的文件大小超过你的stdio缓冲区，这个问题才可能出现，尽管.. – thebjorn

完全符合@rychaza同意，这是一个使用你的代码

file = open("myfile.bg", "r") 
out = open("outfile.bg", "w") 
for line in file: 
    out.write("chr" + line) 
out.close() 
file.close()

来源

2017-10-04 18:12:21 Thecave3

您无法打开相同的文件以进行输入和输出（至少在大于stdio缓冲区大小的情况下不会）。另外你正在泄漏文件句柄。 – thebjorn

@thebjorn答案并不是 - 输入和输出文件不同。 – ryachza

啊，对不起，我的坏。 – thebjorn

修改每一行的文本文件在Python

回答

相关问题