下载Python中的错误大文件：压缩文件结束结束流标记达到

之前，我从互联网上下载一个压缩文件：下载Python中的错误大文件：压缩文件结束结束流标记达到

with lzma.open(urllib.request.urlopen(url)) as file: 
    for line in file: 
     ...

已经下载完毕后并处理的AA很大一部分文件，我最终得到了错误：

File "/usr/lib/python3.4/lzma.py", line 225, in _fill_buffer raise EOFError("Compressed file ended before the " EOFError: Compressed file ended before the end-of-stream marker was reached

我想，这可能是由下降或服务器没有响应一段时间的互联网连接造成的。如果是这样的话，有没有办法让它继续尝试，直到连接重新建立，而不是抛出异常。我不认为这是该文件的问题，因为我手动从同一网站手动下载了许多像这样的文件并手动解压。我也可以用Python下载和解压缩一些较小的文件。我尝试下载的文件的压缩大小约为20 GB。

来源

2015-04-01 ClickyButton.com

在得到错误之前需要多长时间才能下载？一些防火墙/代理似乎在固定的超时后（例如10分钟）终止连接。如果它在相同的时间间隔后总是失败，那可能是一个线索...... – DNA 2015-04-01 08:48:33

[Python LZMA：压缩数据在达到流结束标记之前结束]的可能重复（http://stackoverflow.com/questions/37400583/python-lzma-compressed-data-ended-end-of-stream-marker-was-reached） – kenorb 2016-05-23 22:51:50

我在尝试使用'urllib在线处理一个非常大的文件时遇到同样的问题.request.urlopen（）'和'gzip'。大约12个小时，我得到了类似的追踪。 – bmende 2016-06-29 20:21:02

从urllib.urlopen docs:

One caveat: the read() method, if the size argument is omitted or negative, may not read until the end of the data stream; there is no good way to determine that the entire stream from a socket has been read in the general case.

也许在巨大的规模/连接错误/超时lzma.open人次以上的原因。

来源

2015-04-01 10:38:41 Pynchia

这可能是liblzma的错误。解决方法尝试添加：

lzma._BUFFER_SIZE = 1023

在致电lzma.open()之前。

来源

2015-09-08 21:40:35 kenorb

假设您需要下载一个大文件，最好在使用python将内容写入文件时使用“写入和二进制”模式。

您也可以尝试使用python requests模块以上的urllib模块：

请参见下面的工作代码：

import requests 
url="http://www.google.com" 
with open("myoutputfile.ext","wb") as f: 
    f.write(requests.get(url).content)

你能测试的代码并回答回来，如果它不解决不了你的问题。

致以问候

来源

2016-07-06 11:14:45

您是否尝试过使用请求库？我相信它提供了一个通过urllib的抽象。

以下解决方案应该适合您，但它使用请求库而不是urllib（但请求> urllib！）。让我知道你是否愿意继续使用urllib。

import os 
import requests 
def download(url, chunk_s=1024, fname=None): 
    if not fname: 
     fname = url.split('/')[-1] 
    req = requests.get(url, stream=True) 
    with open(fname, 'wb') as fh: 
     for chunk in req.iter_content(chunk_size=chunk_s): 
      if chunk: 
       fh.write(chunk) 
    return os.path.join(os.getcwd(), fname)

来源

2016-07-06 16:26:49

下载Python中的错误大文件：压缩文件结束结束流标记达到

回答

相关问题