2016-06-08 51 views
1

所以我有以下的HTML。使用findAll来检索没有class和id的日期

<div class="media-body"><i class="" style="text-shadow:1px 1px 0px #dcdcdc;">29 May 2016 </i><a href="http://www.sharesansar.com/events/opening-day-of-auction-of-tinau-development-bank-limited-21903-32-units-ordinary-unclaimed-right-share/"><h4 class="media-heading">Opening Day of auction of Tinau Development Bank Limited 21,903.32 units ordinary unclaimed right share.</h4></a><p>Mini Bid Amt: Rs 100 Mini Application: 100 units or multiply by 10 Opening Date: 16th Jestha, 2073 Closing Date: 30th Jestha, 2073 Bid Opening Date: 31st Jestha, 2073 Time: 3:15 PM Contact: Siddhartha Capital Limited, Anamnagar, Kathmandu, 4257767, 4257768</p></div> 

而我一直试图用下面的代码检索日期为2016年5月29日,它不会工作。

import requests 
from bs4 import BeautifulSoup 
from urllib.request import urlopen 
from urllib.error import HTTPError 
def events_log(max_pages): 
    page = 1 
    while page <= max_pages: 
     url = 'http://www.sharesansar.com/events/2016/06/page/'+str(page)+'/' 
     try: 
      html = urlopen(url) 
     except HTTPError as e: 
      print(e) 
     else: 
      if html is None: 
       print ("URL is not found") 
      else: 
       soup = BeautifulSoup(html.read(), 'lxml') 
       for name in soup.findAll('i', {'class':''}): 
        print(name.get_text()) 
events_log(1) 

我是完整的noob,自从昨天以来一直试图解决这个问题。

回答

1

请记住增加您的page counter。用一个简单的修改你的代码,并没有错误检查(你呢),它工作得很好:

import requests 
from bs4 import BeautifulSoup 

def events_log(max_pages): 
    page = 1 
    while page <= max_pages: 
     url = 'http://www.sharesansar.com/events/2016/06/page/'+str(page)+'/' 

     res = requests.get(url) 

     soup = BeautifulSoup(res.text, 'lxml') 
     for name in soup.findAll('i', {'class':''}): 
      print(name.get_text()) 
     page += 1 

events_log(1) 

输出:

30 Jun 2016 
30 Jun 2016 
29 Jun 2016 
29 Jun 2016 
28 Jun 2016 
28 Jun 2016 
26 Jun 2016 
24 Jun 2016 
24 Jun 2016 
22 Jun 2016 
+0

谢谢。我一直在编辑我的代码并从昨天开始尝试。最后,当我决定在这里提问时,我提出的问题的代码是正确的,它运行良好,我只是没有检查输出。我甚至不知道我以前做错了什么,当我刚刚检查它时,我想哭,并且它给出了输出。有关如何进行错误检查的任何建议。 – mad