我已经在位置C:\ webpage.htm保存了一个网页。我想加载它并使用BeautifulSoup进行分析,但是urllib不会打开它。Python与保存的网页urlopen错误
from BeautifulSoup import BeautifulSoup
import urllib2
url="C:\webpage.htm"
page=urllib2.urlopen(url)
这引发了错误:
Traceback (most recent call last):
page=urllib2.urlopen(url)
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 423, in _open
'unknown_open', req)
File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1240, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib2.URLError: <urlopen error unknown url type: c>
如何解决这个问题或有另一种方式来加载文档转换成美丽的汤(我曾试图把它保存为文本文件,但该扔了错误:
'str' object has no attribute 'findall'
谢谢西尔维斯特,这工作!不过,我使用Firefox保存了它,所以只有.htm才有效。 – user578582