使用python beautifulsoup web scrapping提取值错误

我想使用python webscrapping在python中使用美丽的汤提取一些信息。这是部分。使用python beautifulsoup web scrapping提取值错误

<div class="result-value" data-reactid=".0.0.3.0.0.3.$0.1.1"> 
<span data-reactid=".0.0.3.0.0.3.$0.1.1.0">1.1</span> 
<span class="result-value-unit" data-reactid=".0.0.3.0.0.3.$0.1.1.1">MB</span> 
</div>

我想要得到的1.1的值有

我使用的部分代码是

try: 
    Area =soup.select(".result-value span") 

    print Area 

except StandardError as e: 
    converted_date="Error was {0}".format(e) 
    print converted_date

结果即时得到的

[]

可错了呢？

来源

2016-11-27 info

Im new new in this stackoverflow。对不起，如果我不遵守标准。我正在阅读我必须在此平台上遵循的标准程序。希望我不打扰任何人 – info

如果'soup.select'没有找到你指定的任何东西，它只会返回一个空列表[]'。所以'try ... except'在这种情况下可能不会捕获任何错误。 – mikeqfu

你有没有想法为什么它没有捕获价值呢？ im按照bs4手册 – info

假设你知道的data-reactid值，就可以得到正确的元素是这样的：

soup.findAll("span", {"data-reactid": ".0.0.3.0.0.3.$0.1.1.0"})

来源

2016-11-27 19:28:35

仍然[]是输出 – info

是的，我知道data-reactid的值 – info

你可以检查你是否真的加载了源代码吗？打印（soup.prettify（）） –

同样，如果soup.find('span', {'data-reactid': '.0.0.3.0.0.3.$0.1.1.0'}).text作品，代码不会返回任何错误消息。您得到的结果消息至少表明您的try...except...功能正在工作。我猜这个问题出在你的htmlfile上，它必须是bytes而不是str。我建议你修改一下你的代码如下：

from urllib.request import urlopen 

htmlfile = urlopen(url).read().decode('utf-8') # if errors occur here, try: htmlfile = urlopen(url).read().decode('utf-8', errors='ignore') 

soup = BeautifulSoup(htmlfile, 'lxml')

然后继续其余的。

来源

2016-11-28 13:32:19 mikeqfu

使用python beautifulsoup web scrapping提取值错误

回答

相关问题