CSV文本提取Beautifulsoup

我是python的新手，这是我第一次使用Beautifulsoup进行练习。我还没有学到针对特定数据提取问题的创造性解决方案。CSV文本提取Beautifulsoup

这个程序打印得很好，但在提取到CSV时有一些困难。它需要第一个元素，但将所有其他元素抛在后面。我只能猜测可能会有一些空格，分隔符或导致代码在初始文本后停止提取的内容？

我试图让CSV提取发生在每个行的项目，但显然是挣扎。感谢您提供任何帮助和/或建议。

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
import csv 

price_page = 'http://www.harryrosen.com/footwear/c/boots' 
page = urlopen(price_page) 
soup = BeautifulSoup(page, 'html.parser') 
product_data = soup.findAll('ul', attrs={'class': 'productInfo'}) 

for item in product_data: 

    brand_name=item.contents[1].text.strip() 
    shoe_type=item.contents[3].text.strip() 
    shoe_price = item.contents[5].text.strip() 
    print (brand_name) 
    print (shoe_type) 
    print (shoe_price) 

with open('shoeprice.csv', 'w') as shoe_prices: 
writer = csv.writer(shoe_prices) 
writer.writerow([brand_name, shoe_type, shoe_price])

来源

2017-01-03 Splaning

您的缩进有挑战 –

下面是解决这个问题的一种方法：

收集结果到字典与list comprehension
列表结果写入到CSV通过csv.DictWriter和单一.writerows()文件致电

执行：

data = [{ 
    'brand': item.li.get_text(strip=True), 
    'type': item('li')[1].get_text(strip=True), 
    'price': item.find('li', class_='price').get_text(strip=True) 
} for item in product_data] 

with open('shoeprice.csv', 'w') as f: 
    writer = csv.DictWriter(f, fieldnames=['brand', 'type', 'price']) 
    writer.writerows(data)

如果您还想写CSV标头，请在writer.writerows(data)之前添加writer.writeheader()呼叫。

请注意，您可以使用常规csv.writer和列表（或元组）列表，但我喜欢在这种情况下使用字典的明确性和增加的可读性。

另请注意，我改进了循环中使用的定位器 - 我不认为使用.contents列表并通过索引获取产品子项是一个很好和可靠的想法。

来源

2017-01-03 03:41:29 alecxe

感谢您的解决方案alecxe！我将继续提出关于使用.contents的建议并了解有关格式化数据的更多信息。 – Splaning

with open('shoeprice.csv', 'w') as shoe_prices: 
    writer = csv.writer(shoe_prices) 
    for item in product_data: 
     brand_name=item.contents[1].text.strip() 
     shoe_type=item.contents[3].text.strip() 
     shoe_price = item.contents[5].text.strip() 
     print (brand_name, shoe_type, shoe_price, spe='\n') 

     writer.writerow([brand_name, shoe_type, shoe_price])

将打开的文件更改为外部循环，因此您不需要在每个循环中打开文件。

来源

2017-01-03 03:44:02

谢谢你的帮助宏杰李！这是另一种非常有用的方法。 – Splaning

CSV文本提取Beautifulsoup

回答

相关问题