2016-11-10 57 views
0

我目前正在尝试从各个餐厅的TripAdvisor网站上提取经度和纬度。我正在浏览香港这间餐厅的HTML。使用Python刮写网页中的Javascript文本

Restaurant I am attempting to scrape from

在HTML,我发现这一点:

HTML Code with the Latitude and Longitude

我想刮从这里纬度和经度,但我似乎无法把它弄出来,当我试图打印它。以下是我的代码,任何建议都会有所帮助。

#import libraries 
import requests 
from bs4 import BeautifulSoup 
import csv 

#loop to move into the next pages. entries are in increments of 30 per page 
for i in range(0, 1, 30): 
    #need this here for when you want more than 30 
    while i <= range: 
     i = str(i) 
    #url format offsets the restaurants in increments of 30 after the oa 
    url1 = 'https://www.tripadvisor.com/Restaurants-g294217-oa' + i + '-Hong_Kong.html#EATERY_LIST_CONTENTS' 
    r1 = requests.get(url1) 
    data1 = r1.text 
    soup1 = BeautifulSoup(data1, "html.parser") 
    for link in soup1.findAll('a', {'property_title'}): 
     #print 'https://www.tripadvisor.com/Restaurant_Review-g294217-' + link.get('href') 
     restaurant_url = 'https://www.tripadvisor.com/Restaurant_Review-g294217-' + link.get('href') 
     #print restaurant_url 
     r2 = requests.get(restaurant_url) 
     data2 = r2.text 
     soup2 = BeautifulSoup(data2, "html.parser") 
     for script in soup2.findAll('script', {'type', 'text/javascript', 'lat'}): 
      print script.string 

回答

0

要抓取JavaScript供电的页面,您需要使用selenium

+0

Selenium要求Python 3.4及更高版本正确吗? – dtrinh

+0

不,它在python 2.7中可用 – amirouche