2014-12-04 81 views
0

我当时正在关注pythonforbeginners.com上的一个教程,并且我遇到了一个在我的OSX上没有正确运行的代码。无法使用urllib2从网站中提取数据

from bs4 import BeautifulSoup 
import urllib2 
url = "http://www.pythonforbeginners.com" 
content = urllib2.urlopen(url).read() 
soup = BeautifulSoup(content) 
print soup.prettify() 

这给我的错误:

Traceback (most recent call last): File "/Users/dhruvmullick/CS/Python/Extracting Data/test.py", line 8, in content = urllib2.urlopen(url).read() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 410, in open response = meth(req, response) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 523, in http_response 'http', request, response, code, msg, hdrs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 448, in error return self._call_chain(*args) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain result = func(*args) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 531, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden

回答

0

403 error表示服务器阻止您的连接。

...a request from a client for a web page or resource to indicate that the server can be reached and understood the request, but refuses to take any further action.

尝试一个不同的域,你会发现它按预期工作。

要做出变通,您可以添加一个custom user-agent

+0

有没有理由为什么这个域名阻止我的连接,而其他人不是? – 2014-12-04 13:53:04

+0

服务器可能会在没有用户代理的情况下阻止任何请求。查看底部的链接以获取添加用户代理的步骤。 – philshem 2014-12-04 14:16:41