刮表与美丽的汤

我试图从这个网站刮的价格表（买是，价格和合同提供）：https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#prices。刮表与美丽的汤

这是我的（显然是非常初步的）代码，现在结构只是为了找到表：

from bs4 import BeautifulSoup 
import requests 
from lxml import html 
import json, re 

url = "https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#prices" 

ret = requests.get(url).text 

soup = BeautifulSoup(ret, "lxml") 

try: 
    table = soup.find('table') 
    print table 
except AttributeError as e: 
    print 'No tables found, exiting'

的代码查找并分析表;然而，这是错误的（不同标签上的数据表https://www.predictit.org/Contract/7069/Will-the-Senate-pass-the-Better-Care-Reconciliation-Act-by-July-31#data）。

我如何解决这个错误，以确保代码识别正确的表？

来源

2017-07-18 libertyspursuit

你要哪台？你最好的选择是使用'soup.find_all（'table'）'然后遍历返回的列表。在遍历它时，只搜索特定元素只有你想要的表有 – TerryA

@TerryA执行该代码并且它没有标识所需的表，只是第一个标签上的表。 – libertyspursuit

你想从第一个链接中得到什么表格？ – TerryA

如@downshift评价提到的表JS使用XHR请求而产生。
因此，您可以使用Selenium或直接请求该网站的api。

使用第二个选项：

url = "https://www.predictit.org/PrivateData/GetPriceListAjax?contractId=7069" 
ret = requests.get(url).text 
soup = BeautifulSoup(ret, "lxml") 
table = soup.find('table')

来源

2017-07-18 23:07:39

谢谢你的帮助！ – libertyspursuit

刮表与美丽的汤

回答

相关问题