我试图获取包含变音符号(í,č...)的页面的html。问题是urllib2.quote
似乎没有按我的预期工作。urllib2.quote无法正常工作
就我而言,报价应该将包含变音符号的url转换为正确的url。
下面是一个例子:
url = 'http://www.example.com/vydavatelství/'
print urllib2.quote(url)
>> http%3A//www.example.com/vydavatelstv%C3%AD/
的问题是,它改变http//
字符串出于某种原因。然后urllib2.urlopen(req)
返回错误:
response = urllib2.urlopen(req)
File "C:\Python27\lib\urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 437, in open response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 550, in http_response 'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 475, in error return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 409, in _call_chain result = func(*args)
File "C:\Python27\lib\urllib2.py", line 558, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request
您是否试过在脚本的顶部放置# - * - coding:utf-8 - * - ? – thefragileomen