在BS4中使用findAll创建列表

我会先说我对Python有点新鲜。我最近在Slack机器人上工作，这里是我目前所处的位置。在BS4中使用findAll创建列表

source = requests.get(url).content 
soup = BeautifulSoup(source, 'html.parser') 
price = soup.findAll("a", {"class":"pricing"})["quantity"]

这是我试图抓取的HTML代码。

<a class="pricing" saleprice="240.00" quantity="1" added="2017-01-01"> S </a> 
<a class="pricing" saleprice="21.00" quantity="5" added="2017-03-14"> M </a> 
<a class="pricing" saleprice="139.00" quantity="19" added="2017-06-21"> L </a>

当我只用soup.find()，我能找到的第一个量值，但我需要一个列表中所有的人。我考虑使用不同的库，如lxml而不是bs4，但没有任何运气。任何帮助真的很感激，因为我已经花了很长时间在这个。

来源

2017-07-31 Helixo

只需注意''.findAll'确实只是为了向后兼容，我相信在leui中不推荐使用更多的Python-y命名约定。我建议使用'.find_all'移动foward。 –

findAll方法返回一个bs4 Tag元素的列表，因此您不能直接选择属性。但是，您可以使用简单的列表理解从迭代中的项中选择属性。

price = [a.get("quantity") for a in soup.findAll("a", {"class":"pricing"})]

请注意，这是最好的时候访问属性，因为它返回None（或者你可以设置默认值），如果该键不在attrs字典中使用get。

正如Jon Clements所指出的，如果您不希望您的清单有None项目，以防某些项目没有“数量”属性，您可以按'class'和'quantity'进行过滤。

price = [a["quantity"] for a in soup.find_all("a", {"class":"pricing", "quantity":True})]

来源

2017-07-31 08:39:53

有些人可能会发现'soup.select（'a.pricing'）'更具可读性......您也可以过滤掉非价格元素，例如：'[a ['quantity'] for a soup .find_all（“a”，class _ ='pricing'，quantity = True）]' –

是的，但我更喜欢'get'方法，这很简单。尽管你对'select'和'find_all'完全正确。 –

如果你想要一个默认值和一致的结果长度，'.get'是有意义的。如果不需要空值，则过滤有意义。取决于用例 - 只是指出了后面的选项。 –

在BS4中使用findAll创建列表

回答

相关问题