1

我们正在努力为Forever 21网站上的每个类别刮取每件产品。给定一个产品页面,我们知道如何提取我们需要的信息,并给出一个类别,我们可以提取每个产品。但是,我们不知道如何抓取每个产品类别。下面是我们的代码给定类别,并获得每一个产品:通过零售商网站上的每个产品进行搜索

import requests 
from bs4 import BeautifulSoup 
import json 
#import re 

params = {"action": "getcategory", 
      "br": "f21", 
      #"category": re.compile('\S+'), 
      "category": "dress", 
      "pageno": 1, 
      "pagesize": "", 
      "sort": "", 
      "fsize": "", 
      "fcolor": "", 
      "fprice": "", 
      "fattr": ""} 

url = "http://www.forever21.com/Ajax/Ajax_Category.aspx" 
js = requests.get(url, params=params).json() 
soup = BeautifulSoup(js[u'CategoryHTML'], "html.parser") 
i = 0 
j = 0 

while len(soup.select("div.item_pic a")) != 0: 
    for a in soup.select("div.item_pic a"): 
     #print a["href"] 
     i = i + 1 

    params["pageno"] = params["pageno"] + 1 
    j = j + 1 
    js = requests.get(url, params=params).json() 
    soup = BeautifulSoup(js[u'CategoryHTML'], "html.parser") 

print i 
print j 

正如你可以在注释中看到,我们试图使用正则表达式的类别,但没有成功。我和j只是产品和页面计数器。有关如何修改/添加到此代码以获取每个产品类别的任何建议?

回答

1

你能凑够分类页面,并从导航菜单所有子类别:

import requests 
from bs4 import BeautifulSoup 


url = "http://www.forever21.com/Product/Category.aspx?br=f21&category=app-main" 
response = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36"}) 

soup = BeautifulSoup(response.content, "html.parser") 
menues = [li["class"][0] for li in soup.select("#has_sub .white nav ul > li")] 
print(menues) 

打印:

[u'women-new-arrivals', u'want_list', u'dress', u'top_blouses', u'outerwear_coats-and-jackets', u'bottoms', u'intimates_loungewear', u'activewear', u'swimwear_all', u'acc', u'shoes', u'branded-shop-women-clothing', u'sale_women|women', u'women-new-arrivals-clothing-dresses', u'women-new-arrivals-clothing-tops', u'women-new-arrivals-clothing-outerwear', u'women-new-arrivals-clothing-bottoms', u'women-new-arrivals-clothing-intimates-loungewear', u'women-new-arrivals-clothing-swimwear', u'women-new-arrivals-clothing-activewear', u'women-new-arrivals-accessories|women-new-arrivals', u'women-new-arrivals-shoes|women-new-arrivals', u'promo-web-exclusives', u'promo-best-sellers-app', u'backinstock-women', u'promo-shop-by-outfit-women', u'occasion-shop-wedding', u'contemporary-main', u'promo-basics', u'21_items', u'promo-summer-forever', u'promo-coming-soon', u'dress_casual', u'dress_romper', u'dress_maxi', u'dress_midi', u'dress_mini', u'occasion-shop-dress', u'top_blouses-off-shoulder', u'top_blouses-lace-up', u'top_bodysuits-bustiers', u'top_graphic-tops', u'top_blouses-crop-top', u'top_t-shirts', u'sweater', u'top_blouses-sweatshirts-hoodies', u'top_blouses-shirts', u'top_plaids', u'outerwear_bomber-jackets', u'outerwear_blazers', u'outerwear_leather-suede', u'outerwear_jean-jackets', u'outerwear_lightweight', u'outerwear_utility-jackets', u'outerwear_trench-coats', u'outerwear_faux-fur', u'promo-jeans-refresh|bottoms', u'bottoms_pants', u'bottoms_skirt', u'bottoms_shorts', u'bottoms_shorts-active', u'bottoms_leggings', u'bottoms_sweatpants', u'bottom_jeans|', u'intimates_loungewear-bras', u'intimates_loungewear-panties', u'intimates_loungewear-bodysuits-slips', u'intimates_loungewear-seamless', u'intimates_loungewear-accessories', u'intimates_loungewear-sets', u'activewear_top', u'activewear_sports-bra', u'activewear_bottoms', u'activewear_accessories', u'swimwear_tops', u'swimwear_bottoms', u'swimwear_one-piece', u'swimwear_cover-ups', u'acc_features', u'acc_jewelry', u'acc_handbags', u'acc_glasses', u'acc_hat', u'acc_hair', u'acc_legwear', u'acc_scarf-gloves', u'acc_home-and-gift-items', u'shoes_features', u'shoes_boots', u'shoes_high-heels', u'shoes_sandalsflipflops', u'shoes_wedges', u'shoes_flats', u'shoes_oxfords-loafers', u'shoes_sneakers', u'Shoes_slippers', u'branded-shop-new-arrivals-women', u'branded-shop-women-clothing-dresses', u'branded-shop-women-clothing-tops', u'branded-shop-women-clothing-outerwear', u'branded-shop-women-clothing-bottoms', u'branded-shop-women-clothing-intimates', u'branded-shop-women-accessories|branded-shop-women-clothing', u'branded-shop-women-accessories-jewelry|', u'branded-shop-shoes-women|branded-shop-women-clothing', u'branded-shop-sale-women', u'/brandedshop/brandlist.aspx', u'promo-branded-boho-me', u'promo-branded-rare-london', u'promo-branded-selfie-leslie', u'sale-newly-added', u'sale_dresses', u'sale_tops', u'sale_outerwear', u'sale_sweaters', u'sale_bottoms', u'sale_intimates', u'sale_swimwear', u'sale_activewear', u'sale_acc', u'sale_shoes', u'the-outlet', u'sale-under-5', u'sale-under-10', u'sale-under-15'] 

brcategory GET参数的值。 f21是“女性”类别,app-main是一个类别的主页面。

+0

感谢您的帮助!为了澄清,这只能得到br = f21的所有类别,对吗? –

+0

@TerryRossi是的,类别f21的子类别。您还可以从主商店页面中刮取顶级类别。 – alecxe

相关问题