2014-09-06 73 views
2

任何想法如何解决这个问题?UnicodeEncodeError:'ascii'编解码器无法编码字符u' u2730'在位置1:序号不在范围内(128)

import csv 
import re 
import time 
import urllib2 
from urlparse import urljoin 
from bs4 import BeautifulSoup 

BASE_URL = 'http://omaha.craigslist.org/sys/' 
URL = 'http://omaha.craigslist.org/sya/' 
FILENAME = '/Users/mona/python/craigstvs.txt' 

opener = urllib2.build_opener() 
opener.addheaders = [('User-agent', 'Mozilla/5.0')] 
soup = BeautifulSoup(opener.open(URL)) 

with open(FILENAME, 'a') as f: 
    writer = csv.writer(f, delimiter=';') 
    for link in soup.find_all('a', class_=re.compile("hdrlnk")): 
     timeset = time.strftime("%m-%d %H:%M") 

     item_url = urljoin(BASE_URL, link['href']) 
     item_soup = BeautifulSoup(opener.open(item_url)) 

     # do smth with the item_soup? or why did you need to follow this link? 

     writer.writerow([timeset, link.text, item_url]) 

回答

0

作为一个经验,我不得不说,CSV模块不支持Unicode完全,但你会发现这种方式非常有用

import codecs 
... 
codecs.open('file.csv', 'r', 'UTF-8') 

打开文件,或者可能要自己处理,而不是使用CSV模块

0

你只需要encode文本:

link.text.encode("utf-8") 

也可以使用requests代替urllib2:

import requests 
BASE_URL = 'http://omaha.craigslist.org/sys/' 
URL = 'http://omaha.craigslist.org/sya/' 
FILENAME = 'craigstvs.txt' 
soup = BeautifulSoup(requests.get(URL).content) 
with open(FILENAME, 'a') as f: 
    writer = csv.writer(f, delimiter=';') 
    for link in soup.find_all('a', class_=re.compile("hdrlnk")): 
     timeset = time.strftime("%m-%d %H:%M") 
     item_url = urljoin(BASE_URL, link['href']) 
     item_soup = BeautifulSoup(requests.get(item_url).content) 
     # do smth with the item_soup? or why did you need to follow this link? 
     writer.writerow([timeset, link.text.encode("utf-8"), item_url]) 
相关问题