2017-10-12 1029 views
0

所以我想使用BeautifulSoup和Python第一次做网页抓取。我试图刮掉页面是:http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172为什么我没有获得领域的价值而不是领域本身?

client = request('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172') 
page_html = client.read() 
client.close() 
page_soup = soup(page_html) 

identification = page_soup.find('div', {'data-bind':'text: name'}) 
print(identification.text) 

当我这样做,我只是得到一个空字符串。如果我打印出简单的标识变量,我得到:

<div class="col-xs-7" data-bind="text: name"></div> 

This is the line of html that I am trying to get the value of, as you can see there is a value A LEBLANC there in the tag

+2

这是一个Ajax驱动的网站,所有数据被加载的Javascript。 –

回答

0

你可以试试这个代码:

from selenium import webdriver 

driver=webdriver.Chrome() 

browser=driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172') 

find=driver.find_element_by_xpath('//*[@id="identificationCollapse"]/div/div/div/div[1]/div[1]/div[2]') 

print(find.text) 

输出:

A LEBLANC 
+0

这里是你如何找到:) https://pasteboard.co/GOCOeBP.png –

0

有几种方法你可以达到同样的目标。但是,我在脚本中使用了选择器,这很容易理解,并且除非该网站的html结构发生重大变化,否则就不会有突破的机会。试试这个。

from selenium import webdriver 
from bs4 import BeautifulSoup 

driver = webdriver.Chrome() 
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172') 
soup = BeautifulSoup(driver.page_source,"lxml") 
driver.quit() 
item_name = soup.select("[data-bind$='name']")[0].text 
print(item_name) 

结果:

A LEBLANC 

顺便说一句,你启动的方式也将工作:

from selenium import webdriver 
from bs4 import BeautifulSoup 

driver = webdriver.Chrome() 
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172') 
soup = BeautifulSoup(driver.page_source,"lxml") 
driver.quit() 
item_name = soup.find('div', {'data-bind':'text: name'}).text 
print(item_name)