从多个html'tbody'获取列标题

我需要从此URL中的第二个tbody获取列标题。从多个html'tbody'获取列标题

http://bepi.mpob.gov.my/index.php/statistics/price/daily.html

具体来说，我想看看 “九月，十月” ......等

我收到以下错误：

runfile('C:/Python27/Lib/site-packages/xy/workspace/webscrape/mpob1.py', wdir='C:/Python27/Lib/site-packages/xy/workspace/webscrape') 
Traceback (most recent call last): 

    File "<ipython-input-8-ab4005f51fa3>", line 1, in <module> 
    runfile('C:/Python27/Lib/site-packages/xy/workspace/webscrape/mpob1.py', wdir='C:/Python27/Lib/site-packages/xy/workspace/webscrape') 

    File "C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile 
    execfile(filename, namespace) 

    File "C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile 
    exec(compile(scripttext, filename, 'exec'), glob, loc) 

    File "C:/Python27/Lib/site-packages/xy/workspace/webscrape/mpob1.py", line 26, in <module> 
    soup.findAll('tbody', limit=2)[1].findAll('tr').findAll('th')] 

IndexError: list index out of range

可以在这里请人帮我出来吗？我将永远感激！

已经张贴下面我的代码：

import requests 

from bs4 import BeautifulSoup 

import pandas as pd 



url = "http://bepi.mpob.gov.my/index.php/statistics/price/daily.html" 



r = requests.get(url) 



soup = BeautifulSoup(r.text, 'lxml') 


column_headers = [th.getText() for th in 
       soup.findAll('tbody', limit=2)[1].findAll('tr').findAll('th')]

来源

2016-09-25 Rian Ashwin

你的意思，你只需要每月选择元素的内容，或者您真正需要点击“查看价格”并解析“按地区划分的MPOB每日FFB参考价格摘要”表格？谢谢 – alecxe

我需要点击'查看价格'。需要解析的表格是“马来西亚半岛：RBD P. Oil，RBD P.Olein＆RBD P. Stearin'当地价格摘要' –

当您单击“查看价格”按钮POST请求被发送到http://bepi.mpob.gov.my/admin2/price_local_daily_view3.php端点。模拟这个POST请求，解析生成的HTML：

import requests 
from bs4 import BeautifulSoup 


with requests.Session() as session: 
    session.get("http://bepi.mpob.gov.my/index.php/statistics/price/daily.html") 

    response = session.post("http://bepi.mpob.gov.my/admin2/price_local_daily_view3.php", data={ 
     "tahun": "2016", 
     "bulan": "9", 
     "Submit2222": "View Price" 
    }) 
    soup = BeautifulSoup(response.content, 'lxml') 

    table = soup.find("table", id="hor-zebra") 
    headers = [td.get_text() for td in table.find_all("tr")[2].find_all("td")] 
    print(headers)

打印表格的标题：

[u'Tarikh', u'September', u'October', u'November', u'December', u'September', u'October', u'November', u'December', u'September', u'October', u'November', u'December']

来源

2016-09-25 12:59:36 alecxe

，这非常完美！谢谢！ –

从多个html'tbody'获取列标题

回答

相关问题