2016-09-23 40 views
0

我目前正在尝试使用Python下载两个文件,一个是gzip文件,另一个是校验和。下载gzip文件,md5校验和它,然后保存提取的数据如果匹配

我想验证gzipped文件的内容是否与md5校验和匹配,然后我想将内容保存到目标目录。

我发现如何下载文件here,并且我学会了如何计算校验和here。我从JSON配置文件加载URL,并且学习了如何解析JSON文件值here

我把它放在下面的脚本,但我卡住试图存储gzip文件的验证内容。

import json 
import gzip 
import urllib 
import hashlib 

# Function for creating an md5 checksum of a file 
def md5Gzip(fname): 
    hash_md5 = hashlib.md5() 

    with gzip.open(fname, 'rb') as f: 
     # Make an iterable of the file and divide into 4096 byte chunks 
     # The iteration ends when we hit an empty byte string (b"") 
     for chunk in iter(lambda: f.read(4096), b""): 
      # Update the MD5 hash with the chunk 
      hash_md5.update(chunk) 

    return hash_md5.hexdigest() 

# Open the configuration file in the current directory 
with open('./config.json') as configFile: 
    data = json.load(configFile) 

# Open the downloaded checksum file 
with open(urllib.urlretrieve(data['checksumUrl'])[0]) as checksumFile: 
    md5Checksum = checksumFile.read() 

# Open the downloaded db file and get it's md5 checksum via gzip.open 
fileMd5 = md5Gzip(urllib.urlretrieve(data['fileUrl'])[0]) 

if (fileMd5 == md5Checksum): 
    print 'Downloaded Correct File' 
    # save correct file 
else: 
    print 'Downloaded Incorrect File' 
    # do some error handling 
+0

在你的'md5Gzip',仅返回散列的'tuple'代替。即'返回hash_md5.digest(),file_content' –

回答

1

在你md5Gzip,返回tuple而不仅仅是散列。

def md5Gzip(fname): 
    hash_md5 = hashlib.md5() 
    file_content = None 

    with gzip.open(fname, 'rb') as f: 
     # Make an iterable of the file and divide into 4096 byte chunks 
     # The iteration ends when we hit an empty byte string (b"") 
     for chunk in iter(lambda: f.read(4096), b""): 
      # Update the MD5 hash with the chunk 
      hash_md5.update(chunk) 
     # get file content 
     f.seek(0) 
     file_content = f.read() 

    return hash_md5.hexdigest(), file_content 

然后,在你的代码:

fileMd5, file_content = md5Gzip(urllib.urlretrieve(data['fileUrl'])[0])