2017-08-08 42 views
0

我想刮名称从该公司的会员目录网页&地址数据:刮名称和地址到字典(Python的BeautifulSoup4)

http://mfda.ca/members/directory-of-members/

我想输出存储在字典中,以关键字作为成员的名称(即3i Financial Investment Services Inc.)和价值作为他们的地址。

我能够追加到字典中的名字,但由于某种原因,我不能附上他们的地址作为关键。任何人都可以指导我如何做到这一点?

import requests 

from bs4 import BeautifulSoup 

import requests 

url = "http://mfda.ca/members/directory-of-members/" 

r = requests.get(url) 

data = r.text 

soup = BeautifulSoup(data) 

#name 
letters= soup.find_all("div", class_="col-sm-6 col-md-6") 

lobbying={} 
for element in letters: 
    lobbying[element.b.get_text()]={} 
print(lobbying)  

#addr 
Addr= soup.find_all("div", class_="col-sm-6 col-md-6 p-marg") 
for element in Addr: 
    address=element.p.get_text() 
    lobbying[element.p.get_text()]["addr"]=address 
+0

字母标记和地址标签的数量不匹配。 –

回答

0

我会建议刮的名称和地址在一起,并同时建立字典:

lobbying = {} 
rows = soup.find_all('div', {'class' : 'row member-name'}) 

for row in rows: 
    try: 
     name = row.find('div', {'class' : 'col-sm-6 col-md-6'}) 
     addr = row.find('div', {'class' : 'col-sm-6 col-md-6 p-marg'}) 
     lobbying[name.a.b.text] = {'addr' : addr.p.text} 
    except AttributeError: 
     pass 

print(lobbying) 

输出:

{ 
    '3i Financial Investment Services Inc.': { 
     'addr': 'Suite #221, 9040 Leslie Street\nRichmond Hill, ON L4B 3M4\nPhone: (905) 597-5000\nFax: (905) 597-8366' 
    }, 
    'ARTECH Asset Advisory Services Inc.': { 
     'addr': '209 - 3993 Henning Drive\nBurnaby, BC\xa0V5C 6P7\nPhone: (604) 434-3863\nFax: (604) 434-3873' 
    } 
... 
}