import requests
from bs4 import BeautifulSoup
'''
It's a web crawler working in ebay, collecting every single item data
'''
def ebay_spider(max_pages):
page = 1
while page <= max_pages:
url = 'http://www.ebay.co.uk/sch/Apple-Laptops/111422/i.html?_pgn=' \
+ str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for link in soup.findAll('a', {'class': 'vip'}):
href = 'http://www.ebay.co.uk' + link.get('href')
title = link.string
get_single_item_data(href)
page += 1
def get_single_item_data(item_url):
source_code = requests.get(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text)
for item_name in soup.findAll('h1', {'id': "itemTitle"}):
print(item_name.string)
ebay_spider(3)
Blockquote And the error say that : http://imgur.com/403a6N8
I tried to fix it but it seems not to work, so any tips/answers how to fix it?的Python BS4模块EDIT: Sorry everyone for faulty title and tag, everything was fixed.
你试过它告诉你什么? 'soup = BeautifulSoup(plain_text,“html.parser”,markup_type = markup_type)'。并请发布错误的文本版本,而不是一个无法读取的图像。 –
这与'requests'模块无关。 – DeepSpace
@让 - 弗朗索瓦法布尔对不起照片的配偶感到遗憾,但是你误会了错误。但问题是我将该行粘贴到我的代码中,并且出现如下错误:SyntaxError:标识符中的无效字符。由于一些奇怪的原因,我找不到它有什么问题。这是前面的错误,是什么帖子是关于:http://pastebin.com/HNL1ENG0 – Auginis