0
我用scrapy收集产品从网站:只能通过按下按钮https://www.coop.nl/boodschappen/groenten-en-aardappelen 但部分产品显示: 香椿米尔producten 我试图用滚动进入按钮,但没有成功 它只能收集12首显示的项目。 如何收集这些产品的数据?scrapy:如何收集仅通过点击“显示更多项目”按钮显示的项目?
这是我的代码:使用动态加载通常都有向外发送HTTP请求来获取新的内容,这可能是由Chrome被抓(我不知道怎么做,在
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
import re
class Product(scrapy.Item):
barcode = scrapy.Field()
name = scrapy.Field()
class BarcodessSpider(CrawlSpider):
name = "coop_barcodes"
allowed_domains = ["www.coop.nl"]
start_urls = [
"https://www.coop.nl/boodschappen/groenten-en-aardappelen/roerbakgroenten/roerbakgroenten"
]
rules = (Rule(LinkExtractor(allow=('https:.*',),
restrict_xpaths='//*[(@id = "showMoreProductsContainer")] | //*[contains(concat(" ", @class, " "), concat(" ", "btn", " "))]'),
callback='parse_item1',
follow=True),)
items = []
def parse_item1(self, response):
for product in response.xpath('//@href'):
prod = product.root
if re.match('\d{8}\d+',str(prod).split('/')[-1]) != None:
self.items.append(name)
for item in self.items:
yield item