python只能使用urllib2获取头文件

我必须实现一个函数来使用urllib2只能获取头文件（不需要执行GET或POST）。这里是我的功能：python只能使用urllib2获取头文件

def getheadersonly(url, redirections = True): 
    if not redirections: 
     class MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler): 
      def http_error_302(self, req, fp, code, msg, headers): 
       return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers) 
      http_error_301 = http_error_303 = http_error_307 = http_error_302 
     cookieprocessor = urllib2.HTTPCookieProcessor() 
     opener = urllib2.build_opener(MyHTTPRedirectHandler, cookieprocessor) 
     urllib2.install_opener(opener) 

    class HeadRequest(urllib2.Request): 
     def get_method(self): 
      return "HEAD" 

    info = {} 
    info['headers'] = dict(urllib2.urlopen(HeadRequest(url)).info()) 
    info['finalurl'] = urllib2.urlopen(HeadRequest(url)).geturl() 
    return info

从答案this和this使用代码。但是，即使该标志为False，此也在做重定向。我试了一下代码：

print getheadersonly("http://ms.com", redirections = False)['finalurl'] 
print getheadersonly("http://ms.com")['finalurl']

它给morganstanley.com在这两种情况下。这里有什么问题？

来源

2012-03-27 jerrymouse

可能重复[如何防止Python的urllib的（2）以下的重定向（http://stackoverflow.com/questions/554446/how-do-i-prevent-pythons-urllib2-from -follow-a-redirect） – bernie 2012-03-27 16:59:43

首先，你的代码中包含了几个错误：

在你安装一个新的全球urlopener，然后在urllib2.urlopen
后续调用的getheadersonly每个请求你让两个HTTP-请求获得响应的两个不同属性。
urllib2.HTTPRedirectHandler.http_error_302的实现并不那么微不足道，我不明白它如何能够防止重定向首先。

基本上，你应该明白，每个处理程序安装在一个开放的处理某种响应。 urllib2.HTTPRedirectHandler是否可以将某些http代码转换为重定向。如果您不想重定向，请不要将重定向处理程序添加到开启器中。如果你不想打开FTP链接，不添加FTPHandler等

这是所有你需要的是创建一个新的揭幕战，并在其中添加urllib2.HTTPHandler()，定制是“HEAD”请求，并传递请求向开启者发送请求的实例，读取属性并关闭响应。

class HeadRequest(urllib2.Request): 
    def get_method(self): 
     return 'HEAD' 

def getheadersonly(url, redirections=True): 
    opener = urllib2.OpenerDirector() 
    opener.add_handler(urllib2.HTTPHandler()) 
    opener.add_handler(urllib2.HTTPDefaultErrorHandler()) 
    if redirections: 
     # HTTPErrorProcessor makes HTTPRedirectHandler work 
     opener.add_handler(urllib2.HTTPErrorProcessor()) 
     opener.add_handler(urllib2.HTTPRedirectHandler()) 
    try: 
     res = opener.open(HeadRequest(url)) 
    except urllib2.HTTPError, res: 
     pass 
    res.close() 
    return dict(code=res.code, headers=res.info(), finalurl=res.geturl())

的

来源

2012-03-27 15:04:19 newtover

+1 That worked .. Thanks @newtover – jerrymouse 2012-03-27 15:08:51

@jerrymouse，我稍微更新了代码，以正确处理40x和50x错误。 – newtover 2012-03-28 11:22:57

感谢您的编辑:) – jerrymouse 2012-03-28 11:59:58

您可以发送HEAD request using httplib。 A HEAD请求与GET请求相同，但服务器不发送消息正文。

来源

2012-03-27 13:51:48 bigblind

我已经这么做了，从最后检查第5行。然而问题在于重定向。有时我不想遵循重定向，而总是重定向。 – jerrymouse 2012-03-27 13:54:49

而我已经在我的问题中提到了这个链接。 – jerrymouse 2012-03-27 13:55:56

python只能使用urllib2获取头文件

回答

相关问题