0
我已经用Python编写了一个简单的脚本。Python卷曲写函数不能在第二次调用
它解析网页的超链接,然后检索这些链接来解析一些信息。
我有类似的脚本运行和重新使用writefunction没有任何问题,由于某种原因失败,我不明白为什么。
一般卷曲的init:
storage = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.USERAGENT, USER_AGENT)
c.setopt(pycurl.COOKIEFILE, "")
c.setopt(pycurl.POST, 0)
c.setopt(pycurl.FOLLOWLOCATION, 1)
#Similar scripts are working this way, why this script not?
c.setopt(c.WRITEFUNCTION, storage.write)
第一次调用中检索链接:
URL = "http://whatever"
REFERER = URL
c.setopt(pycurl.URL, URL)
c.setopt(pycurl.REFERER, REFERER)
c.perform()
#Write page to file
content = storage.getvalue()
f = open("updates.html", "w")
f.writelines(content)
f.close()
... Here the magic happens and links are extracted ...
现在循环这些链接:
for i, member in enumerate(urls):
URL = urls[i]
print "url:", URL
c.setopt(pycurl.URL, URL)
c.perform()
#Write page to file
#Still the data from previous!
content = storage.getvalue()
f = open("update.html", "w")
f.writelines(content)
f.close()
#print content
... Gather some information ...
... Close objects etc ...
您可以在循环中尝试'c.setopt(c.WRITEFUNCTION,f.write)'以避免将数据附加到同一个对象。 'Curl()'是可重用的,这可能就足够了。 – jfs 2013-05-05 22:55:46
没有,这不起作用,我以前试过,我认为这只是通过参考。 是否有可能从第一页开始的字符串长度太大(与使用Curl和Python进行检索的其他内容相比,网页非常大) – honda4life 2013-05-06 17:18:20