BeautifulSoup获取标签之间没有任何东西

我是一个编写网络爬虫的新手。我想使用http://www.creditchina.gov.cn/search_all#keyword=&searchtype=0&templateId=&creditType=&areas=&objectType=2&page=1的搜索引擎来检查我的输入是否有效。BeautifulSoup获取标签之间没有任何东西

例如，912101127157655762是有效输入，912101127157655760无效。

观察从开发工具网站的源代码后，我发现，如果输入的是无效号码，标签是：

而如果输入的是有效的，标签将是：

因此，我想通过检查'ul class =“credit-info-results public-results-left item-template”'标签中是否有任何内容来确定输入是否有效。这里是我写我的网络爬虫的：

import urllib 
from bs4 import BeautifulSoup 
url = 'http://www.creditchina.gov.cn/search_all#keyword=912101127157655762&searchtype=0& 
templateId=&creditType=&areas=&objectType=2&page=1' 
req = urllib.request.Request(url) 
data = urllib.request.urlopen(req) 
bs = data.read().decode('utf-8') 
soup = BeautifulSoup(bs, 'lxml') 
check = soup.find_all("ul", {"class": "credit-info-results public-results-left item-template"}) 
if check == []: 
    # TODO 
if check != []: 
    # TODO

但是，check的值总是[]。我无法理解为什么选项卡之间没有任何内容。希望有人可以帮我解决问题。

来源

2017-10-10 Sara_Hsu

你没有html，但JS对象作为响应。这就是BS无法解析它的原因。

您可以使用子字符串搜索来检查响应是否包含某些内容。

import urllib 
from bs4 import BeautifulSoup 
url = 'http://www.creditchina.gov.cn/search_all#keyword=912101127157655762&searchtype=0& 
templateId=&creditType=&areas=&objectType=2&page=1' 
req = urllib.request.Request(url) 
data = urllib.request.urlopen(req) 
bs = data.read().decode('utf-8') 

ul_pos = bs.find('credit-info-results public-results-left item-template') 
if ul_pos <> 0: 
    bs = bs[ul_pos:] 

soup = BeautifulSoup(bs, 'lxml') 
check = soup.find_all("ul", {"class": "credit-info-results public-results-left item-template"}) 
if check == []: 
    # TODO 
if check != []: 
    # TODO

来源

2017-10-10 13:02:52

如何知道我是否得到了html而不是JS对象？此外，我检查了bs.find（'credit-info-results public-results-left item-template'）的值是39202，同时输入912101127155762和912101127157655760，它们应该返回不同的输出值。这是令人困惑的... –

我已经更新了我的答案，请尝试一下。不幸的是，由于本网站禁止了我的请求，所以我无法自己测试。 –

我发现我得到的只是网络的一个模板。我需要看网络而不是开发人员工具的元素。无论如何，感谢您的时间和耐心！ –

BeautifulSoup获取标签之间没有任何东西

回答

相关问题