为什么编码不总是工作？

我有一个Python代码，它尝试读取用西里尔字母（例如俄语）书写的RSS源。这是我使用的代码：为什么编码不总是工作？

import feedparser 
from urllib2 import Request, urlopen 

d=feedparser.parse(source_url) 

# Make a loop over the entries of the RSS feed. 
for e in d.entries: 
    # Get the title of the news. 
    title = e.title 
    title = title.replace(' ','%20') 
    title = title.encode('utf-8') 

    # Get the URL of the entry. 
    url = e.link 
    url = url.encode('utf-8') 


    # Make the request. 
    address = 'http://example.org/save_link.php?title=' + title + '&source=' + source_name + '&url=' + url 

    # Submit the link. 
    req = Request(address) 
    f = urlopen(req)

我用encode('utf-8')由于标题在西里尔字母给出，它工作正常。 RSS源的一个例子是here。当我尝试从另一个URL读取RSS源的列表时出现问题。更详细地说，有一个网页，其中包含RSS源的列表（源的URL以及用西里尔文字母给出的名称）。列表中的一个例子是在这里：当我尝试申请编码（“UTF-8”），该文件中给出的西里尔字母出现

<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN' 'http://www.w3.org/TR/html4/loose.dtd'> 
<html> 
<head> 
<title></title> 
<meta http-equiv='Content-Type' content='text/html;charset=utf-8'> 

ua, Корреспондент, http://k.img.com.ua/rss/ua/news.xml 
ua, Українська Правда, http://www.pravda.com.ua/rss/ 

</body> 
</html>

的问题。我得到一个UnicodeDecodeError。有人知道为什么吗？

来源

2012-07-11 Roman

encode如果提供它str对象，就会试图解码为unicode只会给UnicodeDecodeError;见http://wiki.python.org/moin/UnicodeDecodeError。

您需要的str对象解码为unicode第一：

name = name.decode('utf-8')

这将在UTF-8编码str，给你一个unicode对象。

它适用于您发布的代码，因为feedparser将已解码的订阅源数据返回到unicode。

来源

2012-07-11 10:03:17 ecatmur

是的，Python 2很有趣。 – 2012-07-11 10:05:50

但是为什么'encode'与RSS源的西里尔文标题一起使用，并且它不能与RSS源列表中给出的源的西里尔文名称一起使用？ – Roman 2012-07-11 10:09:22

@Roman可能是因为你没有解码列表中的名字。 – ecatmur 2012-07-11 10:16:52

为什么编码不总是工作？

回答

相关问题