无法显示在span标签

之间这是我到目前为止的代码内容：http://pastebin.com/CdUiXpdf 无法显示在span标签

import requests 
from bs4 import BeautifulSoup 


def web_crawler(max_pages): 
    page = 1 
    while page <= max_pages: 
     url = "https://www.kupindo.com/Knjige/artikli/1_strana_" + str(page) 
     source_code = requests.get(url) 
     plain_text = source_code.text 
     soup = BeautifulSoup(plain_text, "html.parser") 
     print("PAGE: " + str(page)) 
     for link in soup.find_all("a", class_="item_link"): 
      href = link.get("href") 
      # title = link.string 
      print(href) 
      # print(title) 
      extended_crawler(href) 
     page += 1 


def extended_crawler(item_url): 
    source_code = requests.get(item_url) 
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text, "html.parser") 
    for view_counter in soup.find_all("span", id="BrojPregleda"): 
     print("View Count: ", view_counter.text) 


web_crawler(1)

输出是例如

PAGE: 1 
https://www.kupindo.com/showcontent/2143/Beletristika/37875219_VUK-DRASKOVIC-Izabrana-dela-1-7-Srpska-rec 
View Count:

所以浏览次数是空的，甚至尽管有用于查找带有BrojPregleda标识的跨度的expanded_crawler函数，不显示任何内容。

来源

2017-02-25 dovla

@Arman你是什么意思PDF格式的代码？ pastebin链接随机以pdf结尾，它是纯文本 – dovla

那是因为其具有的ID BrojPregleda跨度正在通过Ajax调用填充。无论是用Selenium来获取值或者请按照下列步骤操作：

1）获取从产品ID在URL

2）后到http://www.kupindo.com/inc/ajx/Predmet/ajxGetBrojPregleda.php有一个FORMDATA关键 - 与1的值IDPredmet）

3）获得的观看次数

例子：

def extended_crawler(item_url): 
    source_code = requests.get(item_url) 
    plain_text = source_code.text 
    soup = BeautifulSoup(plain_text, "html.parser") 
    ViewCount = requests.post('http://www.kupindo.com/inc/ajx/Predmet/ajxGetBrojPregleda.php', data = {'IDPredmet': item_url[item_url.rfind('/') + 1:item_url.rfind('_')]}) 
    print (ViewCount.text)

来源

2017-02-25 21:22:45 Zroq

这很有效，非常感谢。从来没有想到这一点 – dovla

无法显示在span标签

回答

相关问题