我正在学习Python中的美味汤和字典。我正在按照斯坦福大学的美丽汤的简短教程在这里找到:http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html将美味汤捕获的值存储在字典中,然后访问这些值
由于访问网站是禁止的我已经将教程中提供的文本存储到字符串,然后将字符串汤转换为汤对象。打印输出如下:
print(soup_string)
<html><body><div class="ec_statements"><div id="legalert_title"><a
href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Senators-
Urging-Them-to-Support-Cloture-and-Final-Passage-of-the-Paycheck-
Fairness-Act-S.2199">'Letter to Senators Urging Them to Support Cloture
and Final Passage of the Paycheck Fairness Act (S.2199)
</a>
</div>
<div id="legalert_date">
September 10, 2014
</div>
</div>
<div class="ec_statements">
<div id="legalert_title">
<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-
Representatives-Urging-Them-to-Vote-on-the-Highway-Trust-Fund-Bill">
Letter to Representatives Urging Them to Vote on the Highway Trust Fund Bill
</a>
</div>
<div id="legalert_date">
July 30, 2014
</div>
</div>
<div class="ec_statements">
<div id="legalert_title">
<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Representatives-Urging-Them-to-Vote-No-on-the-Legislation-Providing-Supplemental-Appropriations-for-the-Fiscal-Year-Ending-Sept.-30-2014">
Letter to Representatives Urging Them to Vote No on the Legislation Providing Supplemental Appropriations for the Fiscal Year Ending Sept. 30, 2014
</a>
</div>
<div id="legalert_date">
July 30, 2014
</div>
</div>
<div class="ec_statements">
<div id="legalert_title">
<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Senators-Urging-Them-to-Vote-Yes-
on-the-Motion-to-Proceed-to-the-Emergency-Supplemental-Appropriations-Act-of-2014-S.2648"></a></div></div></body></html>
在某些时候的导师捕捉汤对象中具有标记“格”的所有元素,类_ =“ec_statements”。该
“我们将通过所有在我们的信件收集的项目,并为每一个,拉出的名称,使之成为我们的字典的关键:
letters = soup_string.find_all("div", class_="ec_statements")
然后导师说。值将是另一个字典,但我们还没有找到其他项目的内容,所以我们将创建一个空的字典对象。“
的代码如下:
lobbying = {}
for element in letters:
lobbying[element.a.get_text()] = {}
然而,当我打印游说字典,我发现的键和值的最后一个元素 - “信为本,以参议员紧压了他们,TO-投票 - 正在进行动议的紧急补充拨款 - 2014年的S.2648号法案“ - 缺少。相反,有一个没有分配密钥的空字典。
for key, value in lobbying.iteritems():
print key, value
{}
Letter to Representatives Urging Them to Vote No on the Legislation Providing Supplemental Appropriations for the Fiscal Year Ending Sept. 30, 2014
{}
Letter to Representatives Urging Them to Vote on the Highway Trust Fund Bill
{}
'Letter to Senators Urging Them to Support Cloture and Final Passage of the Paycheck Fairness Act (S.2199)
{}
你如何解释这一点?您的建议将不胜感激。
last'div'没有文本,所以它创建了以空字符串为键的元素。而你将它看作是“一个没有分配键的空字典”。 – furas
顺便说一句:至少使用'print'>“,key,”<“'你会看到你的键是空字符串,或者它只有'spaces','tabs'和'entered' – furas