我最近在使用python编写web爬虫时观看了新视频视频。出于某种原因,我得到一个SSLError。我试图用第6行代码修复它,但没有运气。任何想法为什么它会抛出错误?该代码是从逐字记录的新波士顿。来自新波士顿的Python Web爬虫
import requests
from bs4 import BeautifulSoup
def creepy_crawly(max_pages):
page = 1
#requests.get('https://www.thenewboston.com/', verify = True)
while page <= max_pages:
url = "https://www.thenewboston.com/trade/search.php?pages=" + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a', {'class' : 'item-name'}):
href = "https://www.thenewboston.com" + link.get('href')
print(href)
page += 1
creepy_crawly(1)
SSL错误是由于到Web证书。它可能是因为你试图抓取的url是'https'。尝试只有http的其他网站。 – Craicerjack 2014-11-24 19:24:02
可能的重复http://stackoverflow.com/q/10667960/783219 – Prusse 2014-11-24 19:46:30
谢谢Craicerjack!我在网站上尝试了它,而不仅仅是“http”,它起作用了!但是,我将如何去使用“https”在域上运行网络爬虫? – Steven 2014-11-24 20:10:12