urllib2.quote无法正常工作

我试图获取包含变音符号（í，č...）的页面的html。问题是urllib2.quote似乎没有按我的预期工作。urllib2.quote无法正常工作

就我而言，报价应该将包含变音符号的url转换为正确的url。

下面是一个例子：

url = 'http://www.example.com/vydavatelství/' 

print urllib2.quote(url) 

>> http%3A//www.example.com/vydavatelstv%C3%AD/

的问题是，它改变http//字符串出于某种原因。然后urllib2.urlopen(req)返回错误：

response = urllib2.urlopen(req)
File "C:\Python27\lib\urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 437, in open response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 550, in http_response 'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 475, in error return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 409, in _call_chain result = func(*args)
File "C:\Python27\lib\urllib2.py", line 558, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request

来源

2015-04-12 Milano Slesarik

您是否试过在脚本的顶部放置＃ - * - coding：utf-8 - * - ？ – thefragileomen

- TL; DR -

两件事情。首先确保你在你的python脚本的顶部包含你的shebang 。这让我们知道如何在文件中编码文本。第二件事，你需要指定安全字符，这些字符不会被quote方法转换。默认情况下，只有/被指定为安全字符。这意味着:正在转换，这正在破坏您的网址。

url = 'http://www.example.com/vydavatelství/' 
urllib2.quote(url,':/') 
>>> http://www.example.com/vydavatelstv%C3%AD/

- 阿多在此 -

所以这里的第一个问题是，urllib2的文档是相当差。通过Kamal提供的链接，我看不到文档中的quote方法。这使得问题解决相当困难。

就这样说，让我稍微解释一下。

urllib2.quote似乎与urllib的报价执行相同，即documented pretty well。 urllib2.quote（）需要四个参数

urllib.parse.quote(string, safe='/', encoding=None, errors=None) 
## string: string your trying to encode 
##  safe: string contain characters to ignore. Defualt is '/' 
## encoding: type of encoding url is in. Default is utf-8 
## errors: specifies how errors are handled. Default is 'strict' which throws a UnicodeEncodeError, I think.

来源

2015-06-05 19:01:25

urllib2.quote无法正常工作

回答

相关问题