0
我想读一些使用python和phantomjs的新闻文章。 我正在使用无尽滚动的网站在滚动到底部时动态加载下一篇文章。 Here是一个示例网址。python硒phantomjs无尽滚动只为第一页工作
我使用下面的代码进行管理,让它工作加载一篇文章,但只有一篇文章......任何人都可以帮助我使其无限工作?或者任何提示有什么不对,都可以改进? 谢谢!
from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
from selenium.webdriver.common.proxy import *
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
# Pretend to be chrome
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
"(KHTML, like Gecko) Chrome/15.0.87"
)
driver = webdriver.PhantomJS(desired_capabilities=dcap)
driver.set_window_size(1120, 550)
## GET
driver.get("https://www.bloomberg.com/news/features/2017-06-08/no-one-has-ever-made-a-corruption-machine-like-this-one")
# print current scrollTop
driver.execute_script('return document.body.scrollTop')
# out: 0
# print current scrollHeight
driver.execute_script('return document.body.scrollHeight')
# out: 18255
# scroll to bottom
driver.execute_script("window.scrollTo(0, 18255)")
# print current scrollTop
driver.execute_script('return document.body.scrollTop')
# out: 17705
# print current scrollHeight
driver.execute_script('return document.body.scrollHeight')
# out: 29050
# It works! Great!
# Scroll to bottom again
driver.execute_script("window.scrollTo(0, 29050)")
# print current scrollTop
driver.execute_script('return document.body.scrollTop')
# out: 28500
# print current scrollHeight
driver.execute_script('return document.body.scrollHeight')
# out: 29050
# It's still the same, no matter how hard I try, it cannot load more...
# According to tolmachofof's suggestion below, I tried to scroll very slowly, still no luck. :<
top = driver.execute_script('return document.body.scrollTop')
height = driver.execute_script('return document.body.scrollHeight')
for i in range(top, height, 100):
driver.execute_script("window.scrollTo(0," + str(i) + ")")
print(driver.execute_script('return document.body.scrollTop'))
sleep(0.2)
请阅读我的问题,我可以让它滚动,但我不知道它为什么只能在第一页上工作... – Student222
您可以非常快地滚动作品。我曾经有过同样的问题。这个解决方案帮助我通过降低滚动速度来无休止地分页。 – tolmachofof
我试着慢慢滚动。仍然没有工作... – Student222