Python卷曲写函数不能在第二次调用

我已经用Python编写了一个简单的脚本。Python卷曲写函数不能在第二次调用

它解析网页的超链接，然后检索这些链接来解析一些信息。

我有类似的脚本运行和重新使用writefunction没有任何问题，由于某种原因失败，我不明白为什么。

一般卷曲的init：

storage = StringIO.StringIO() 
c = pycurl.Curl() 
c.setopt(pycurl.USERAGENT, USER_AGENT) 
c.setopt(pycurl.COOKIEFILE, "") 
c.setopt(pycurl.POST, 0) 
c.setopt(pycurl.FOLLOWLOCATION, 1) 
#Similar scripts are working this way, why this script not? 
c.setopt(c.WRITEFUNCTION, storage.write)

第一次调用中检索链接：

URL = "http://whatever" 
REFERER = URL 

c.setopt(pycurl.URL, URL) 
c.setopt(pycurl.REFERER, REFERER) 
c.perform() 

#Write page to file 
content = storage.getvalue() 
f = open("updates.html", "w") 
f.writelines(content) 
f.close() 
... Here the magic happens and links are extracted ...

现在循环这些链接：

for i, member in enumerate(urls): 
    URL = urls[i] 
    print "url:", URL 
    c.setopt(pycurl.URL, URL) 
    c.perform() 

    #Write page to file 
    #Still the data from previous! 
    content = storage.getvalue() 
    f = open("update.html", "w") 
    f.writelines(content) 
    f.close() 
    #print content 
    ... Gather some information ... 
    ... Close objects etc ...

来源

2013-05-05 honda4life

您可以在循环中尝试'c.setopt（c.WRITEFUNCTION，f.write）'以避免将数据附加到同一个对象。 'Curl（）'是可重用的，这可能就足够了。 – jfs 2013-05-05 22:55:46

没有，这不起作用，我以前试过，我认为这只是通过参考。是否有可能从第一页开始的字符串长度太大（与使用Curl和Python进行检索的其他内容相比，网页非常大） – honda4life 2013-05-06 17:18:20

如果你想下载的URL到不同的文件中序列（无并发连接）：

for i, url in enumerate(urls): 
    c.setopt(pycurl.URL, url) 
    with open("output%d.html" % i, "w") as f: 
     c.setopt(c.WRITEDATA, f) # c.setopt(c.WRITEFUNCTION, f.write) also works 
     c.perform()

注：

storage.getvalue()返回从它产生的那一刻写入storage一切。在你的情况，你应该找到多个URL在它的输出
open(filename, "w")覆盖文件（以前的内容消失了），即update.html包含无论是在content上最后迭代循环的

来源

2013-05-06 19:18:35 jfs

“storage.getvalue（）返回从现在开始写入存储的所有内容被建造。” 这就是我想听到的，可能我没有注意到它在我的其他脚本中，当用浏览器打开它时可能会被忽略，当用文本编辑器打开时它可能是可见的或类似的东西。 – honda4life 2013-05-06 19:55:32

Python卷曲写函数不能在第二次调用

回答

相关问题