之间这是我到目前为止的代码内容:http://pastebin.com/CdUiXpdf无法显示在span标签
import requests
from bs4 import BeautifulSoup
def web_crawler(max_pages):
page = 1
while page <= max_pages:
url = "https://www.kupindo.com/Knjige/artikli/1_strana_" + str(page)
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
print("PAGE: " + str(page))
for link in soup.find_all("a", class_="item_link"):
href = link.get("href")
# title = link.string
print(href)
# print(title)
extended_crawler(href)
page += 1
def extended_crawler(item_url):
source_code = requests.get(item_url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
for view_counter in soup.find_all("span", id="BrojPregleda"):
print("View Count: ", view_counter.text)
web_crawler(1)
输出是例如
PAGE: 1
https://www.kupindo.com/showcontent/2143/Beletristika/37875219_VUK-DRASKOVIC-Izabrana-dela-1-7-Srpska-rec
View Count:
所以浏览次数是空的,甚至尽管有用于查找带有BrojPregleda标识的跨度的expanded_crawler函数,不显示任何内容。
@Arman你是什么意思PDF格式的代码? pastebin链接随机以pdf结尾,它是纯文本 – dovla