用户可能会将一堆url作为命令行参数。过去给出的所有URL都用pickle序列化。脚本检查所有给定的URL,如果它们是唯一的,那么它们将被序列化并附加到文件中。至少这是应该发生的事情。没有任何内容被追加。但是,当我以写入模式打开文件时,会写入新的唯一URL。那么是什么给了?代码是:pickle.dump在追加文件时不转储
def get_new_urls():
if(len(urls.URLs) != 0): # check if empty
with open(urlFile, 'rb') as f:
try:
cereal = pickle.load(f)
print(cereal)
toDump = []
for arg in urls.URLs:
if (arg in cereal):
print("Duplicate URL {0} given, ignoring it.".format(arg))
else:
toDump.append(arg)
except Exception as e:
print("Holy bleep something went wrong: {0}".format(e))
return(toDump)
urlsToDump = get_new_urls()
print(urlsToDump)
# TODO: append new URLs
if(urlsToDump):
with open(urlFile, 'ab') as f:
pickle.dump(urlsToDump, f)
# TODO check HTML of each page against the serialized copy
with open(urlFile, 'rb') as f:
try:
cereal = pickle.load(f)
print(cereal)
except EOFError: # your URL file is empty, bruh
pass
尽管原创性很好,但请记住,这是一个孩子友好的网站;-( –
“不是dumpin'没有东西”只是**错误** – mentalita