2017-07-26 88 views
1

我想从许多不同的网站,包含JavaScript代码(因此为什么我使用硒方法来获取信息)的网络刮取数据。 一切是伟大的工作,但是当我尝试加载下一个网址,我得到一个很长的错误消息:如何在Python中编写硒循环?

> Traceback (most recent call last): 
    File "C:/Python27/air17.py", line 46, in <module> 
    scrape(urls) 
    File "C:/Python27/air17.py", line 28, in scrape 
    browser.get(url) 
    File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 268, in get 
    self.execute(Command.GET, {'url': url}) 
    File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 254, in execute 
    response = self.command_executor.execute(driver_command, params) 
    File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 464, in execute 
    return self._request(command_info[0], url, body=data) 
    File "C:\Python27\lib\site-packages\selenium\webdriver\remote\remote_connection.py", line 487, in _request 
    self._conn.request(method, parsed_url.path, body, headers) 
    File "C:\Python27\lib\httplib.py", line 1042, in request 
    self._send_request(method, url, body, headers) 
    File "C:\Python27\lib\httplib.py", line 1082, in _send_request 
    self.endheaders(body) 
    File "C:\Python27\lib\httplib.py", line 1038, in endheaders 
    self._send_output(message_body) 
    File "C:\Python27\lib\httplib.py", line 882, in _send_output 
    self.send(msg) 
    File "C:\Python27\lib\httplib.py", line 844, in send 
    self.connect() 
    File "C:\Python27\lib\httplib.py", line 821, in connect 
    self.timeout, self.source_address) 
    File "C:\Python27\lib\socket.py", line 575, in create_connection 
    raise err 
error: [Errno 10061] 

从第一个网站的数据是CSV文件,但是当代码试图打开下一个网站冻结,我得到这个错误信息。 我在做什么错?

from bs4 import BeautifulSoup 
from selenium import webdriver 
import time 
import urllib2 
import unicodecsv as csv 
import os 
import sys 
import io 
import time 
import datetime 
import pandas as pd 
from bs4 import BeautifulSoup 
import MySQLdb 
import re 
import contextlib 
import selenium.webdriver.support.ui as ui 

filename=r'output.csv' 

resultcsv=open(filename,"wb") 
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1') 
output.writerow(['TIME','FLIGHT','FROM','AIRLANE','AIRCRAFT','IHAVETODELETETHIS','STATUS']) 


def scrape(urls): 
    browser = webdriver.Firefox() 
    for url in urls: 
     browser.get(url) 
     html = browser.page_source 
     soup=BeautifulSoup(html,"html.parser") 
     table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" }) 
     datatable=[] 
     for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"): 
      temp_data = [] 
      for data in record.find_all("td"): 
       temp_data.append(data.text.encode('latin-1')) 
      datatable.append(temp_data) 

     output.writerows(datatable) 

     resultcsv.close() 
     time.sleep(10) 
     browser.quit() 

urls = ["https://www.flightradar24.com/data/airports/bud/arrivals", "https://www.flightradar24.com/data/airports/fco/arrivals"] 
scrape(urls) 
+1

这些有太多是外循环(一个片以内): resultcsv.close() browser.quit() – CrazyElf

+0

这是解决方案!谢谢,它正在运行! :) – tardos93

回答

4

不知道你有方法结束时browser.quit()是个好主意。按照Selenium doc

退出()

退出驾驶员和密切相关的每一个窗口。

我认为一个browser.close()as documented here)将在循环中足够。将browser.quit()保持在循环之外。

+0

我不觉得连循环内需要browser.close() – CrazyElf

+2

的确,退出正在查杀webdriver –

+1

@CrazyElf关闭当前页面比较干净,它会释放内存。 –