Python - 打开并更改大型文本文件

我有一个〜600MB的Roblox类型.mesh文件，它在任何文本编辑器中都像文本文件一样读取。下面我有以下代码：Python - 打开并更改大型文本文件

mesh = open("file.mesh", "r").read() 
mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{") 
mesh = "{"+mesh+"}" 
f = open("p2t.txt", "w") 
f.write(mesh)

它返回：

Traceback (most recent call last): 
    File "C:\TheDirectoryToMyFile\p2t2.py", line 2, in <module> 
    mesh = mesh.replace("[", "{").replace("]", "}").replace("}{", "},{") 
MemoryError

这里是我的文件的样本：

[-0.00599, 0.001466, 0.006][0.16903, 0.84515, 0.50709][0.00000, 0.00000, 0][-0.00598, 0.001472, 0.00599][0.09943, 0.79220, 0.60211][0.00000, 0.00000, 0]

我能做些什么？

编辑：

我不知道什么头，跟着，和尾命令是在这个标记为重复的其他线程。我试图使用它，但无法使它工作。该文件也是一个巨大的线，它不分成线。

来源

2015-06-22 GShocked

尝试做替换的一次一个。尝试阅读一些教程。 – wwii

这并没有工作 – GShocked

可能的重复[在Python中读取大文本文件，一行一行地将其加载到内存中]（http://stackoverflow.com/questions/6475328/read-large-text-files-in -python-line-by-line-without-loading -in-to-memory） –

您需要阅读每次迭代一咬牙，分析它，然后写入到另一个文件或sys.stdout。试试这个代码：

mesh = open("file.mesh", "r") 
mesh_out = open("file-1.mesh", "w") 

c = mesh.read(1) 

if c: 
    mesh_out.write("{") 
else: 
    exit(0) 
while True: 
    c = mesh.read(1) 
    if c == "": 
     break 

    if c == "[": 
     mesh_out.write(",{") 
    elif c == "]": 
     mesh_out.write("}") 
    else: 
     mesh_out.write©

UPD：

它的工作原理很慢（感谢jamylak）。所以我改变了它：

import sys 
import re 


def process_char(c, stream, is_first=False): 
    if c == '': 
     return False 
    if c == '[': 
     stream.write('{' if is_first else ',{') 
     return True 
    if c == ']': 
     stream.write('}') 
     return True 


def process_file(fname): 
    with open(fname, "r") as mesh: 
     c = mesh.read(1) 
     if c == '': 
      return 
     sys.stdout.write('{') 

     while True: 
      c = mesh.read(8192) 
      if c == '': 
       return 

      c = re.sub(r'\[', ',{', c) 
      c = re.sub(r'\]', '}', c) 
      sys.stdout.write(c) 


if __name__ == '__main__': 
    process_file(sys.argv[1])

所以现在它的工作~15秒1.4G文件。要运行它：

$ python mesh.py file.mesh > file-1.mesh

来源

2015-06-22 03:57:40

很好。另请参阅此问题http://stackoverflow.com/questions/2872381/how-to-read-a-file-byte-by-byte-in-python-and-how-to-print-a-bytelist-as- a-binar – maxymoo

使用'''with'''语句在*上下文管理器中工作*可能是一个好主意。 ''''mesh_out'''应该打开* *附加* – wwii

虽然每次读取'1'字节是超慢的。你应该使用例如缓冲区大小。默认'8192'并在每个块上运行'.replace（）' – jamylak

您可以通过线做线：

mesh = open("file.mesh", "r") 
with open("p2t.txt", "w") as f: 
    for line in mesh: 
     line= line.replace("[", "{").replace("]", "}").replace("}{", "},{") 
     line = "{"+line +"}" 
     f.write(line)

来源

2015-06-22 03:51:20 maxymoo

仍然是内存错误，也许我需要更多内存？我有8GB，但我的一个棍棒失败，我现在只有4GB – GShocked

现在试试，这应该遍历行 – maxymoo

仍然有一个内存错误 – GShocked

import os 
f = open('p2f.txt','w') 
with open("file.mesh") as mesh: 
    while True: 
    c = mesh.read(1) 
    if not c: 
     f.seek(-1,os.SEEK_END) 
     f.truncate() 
     break 
    elif c == '[': 
     f.write('{') 
    elif c == ']': 
     f.write('},') 
    else: 
     f.write(c)

p2f.txt：

{-0.00599, 0.001466, 0.006},{0.16903, 0.84515, 0.50709},{0.00000, 0.00000, 0},{-0.00598, 0.001472, 0.00599},{0.09943, 0.79220, 0.60211},{0.00000, 0.00000, 0}

来源

2015-06-22 03:51:56

'1如前所述，字节一次超慢。你应该读一个更大的缓冲区大小。如果你不相信我查看Linus torvalds所说的内容 – jamylak

@jamylak我同意，但我试图避免MemoryError :) –

没有错，内存一次只能容纳超过1个字节。 – jamylak

-1

def read(afilename): 
    with open("afilename", "r") as file 
     lines = file.readlines() 
     lines.replace("[", "{") 
     #place reset of code here in

来源

2015-06-22 04:04:39 AuzPython

'lines = file.readlines（）'已经杀死内存 – jamylak

，这取决于正在读取/写入的文件的大小。在一个小文件上，你能说你会注意到吗？ – AuzPython

在一个小文件上你不会注意到。但问题是关于“打开大文件”，并说“600mb文件”。另外它的坏习惯使用'.readlines（）'，我从不使用它 – jamylak

BLOCK_SIZE = 1 << 15 
with open(input_file, 'rb') as fin, open(output_file, 'wb') as fout: 
    for block in iter(lambda: fin.read(BLOCK_SIZE), b''): 
     # do your replace 
     fout.write(block)

来源

2015-06-22 04:23:25 LittleQ

Python - 打开并更改大型文本文件

回答

相关问题