2016-09-19 60 views
0

我想用类似的结构刮网站的内容嵌套For循环不等实体

https://www.wellstar.org/locations/pages/default.aspx

使用所提供的网站,作为一个框架,我想提取位置的名称和与该位置相关的标题。我希望能够产生如下:

WellStar医院

WELLSTAR亚特兰大MEDICAL CENTER

WellStar医院

WELLSTAR亚特兰大医疗中心南

...

WellStar Health Parks

Acworth的卫生PARK

...

至此我已尝试嵌套for循环:

for type in soup.find_all("h3",class_="WebFont SpotBodyGreen"): 
    for name in soup.find_all("div",class_="PurpleBackgroundHeading"): 
     print(type.text, name.text) 

上面for loop返回由于每个名称重复与每个类型成对呈现无论在网站上。不管是以代码和/或推荐的资源来处理这个任务,任何帮助都将不胜感激。

回答

1

您需要一种按名称对位置进行分组的方法。对于这一点,我们每个块分开,让收集到一本字典的名称和地点:

from pprint import pprint 

import requests 
from bs4 import BeautifulSoup 

url = "https://www.wellstar.org/locations/pages/default.aspx" 
response = requests.get(url) 
soup = BeautifulSoup(response.content, "html.parser") 

d = {} 
for row in soup.select(".WS_Content > .WS_LeftContent > table > tr"): 
    title = row.h3.get_text(strip=True) 

    d[title] = [item.get_text(strip=True) for item in row.select(".PurpleBackgroundHeading a")] 

pprint(d) 

打印(适合打印用pprint()):

{'WellStar Community Hospice': ['Tranquility at Cobb Hospital', 
           'Tranquility at Kennesaw Mountain'], 
'WellStar Health Parks': ['Acworth Health Park', 'East Cobb Health Park'], 
'WellStar Hospitals': ['WellStar Atlanta Medical Center', 
         'WellStar Atlanta Medical Center South', 
         'WellStar Cobb Hospital', 
         'WellStar Douglas Hospital', 
         'WellStar Kennestone Hospital', 
         'WellStar North Fulton Hospital', 
         'WellStar Paulding Hospital', 
         'WellStar Spalding Regional Hospital', 
         'WellStar Sylvan Grove Hospital', 
         'WellStar West Georgia Medical Center', 
         'WellStar Windy Hill Hospital'], 
'WellStar Urgent Care Centers': ['WellStar Urgent Care in Acworth', 
            'WellStar Urgent Care in Kennesaw', 
            'WellStar Urgent Care in Marietta - Delk ' 
            'Road', 
            'WellStar Urgent Care in Marietta - East ' 
            'Cobb', 
            'WellStar Urgent Care in Marietta - ' 
            'Kennestone', 
            'WellStar Urgent Care in Marietta - Sandy ' 
            'Plains Road', 
            'WellStar Urgent Care in Smyrna', 
            'WellStar Urgent Care in Woodstock']} 
+0

你能解释一下什么是在'd执行[title] = [item.get_text(strip = True)for row.select(“。PurpleBackgroundHeading a”)]'line?我怀疑你是在哪里加入字典的标题密钥的价值?如果是这样,我将如何去为每个键添加另一个值。例如,我将如何将每个位置的地址添加到字典中? – Daniel

+0

@丹尼尔好吧,如果您需要进一步的帮助,请将其制定为单独的问题!谢谢。 – alecxe