2016-04-03 40 views
1

我正在使用Python 3,MySQL,Sequel Pro和BeautifulSoup。使用抓取的数据填充MySQL表格

简而言之,我想创建一个SQL表,然后将我下载的数据插入到该数据中。

我已经使用此答案作为模板来构建SQL部分Beautiful soup webscrape into mysql,但它不起作用。

错误抛出:

line 86 finally:SyntaxError: invalid syntax 

当我注释掉这最后finally:(只是看代码的其他工作),我得到:

InternalError: (1054, "Unknown column 'address' in 'field list'") 

我有另一种常见的错误是:

ProgrammingError: (1146, "Table 'simple_scrape.simple3' doesn't exist", 虽然我不记得我所做的最终的错误的确切更改。

最后 - 我不到四周前就开始学习编程(不仅仅是Python,而是'编程') - 如果你想知道为什么我做了一些愚蠢或效率低下的事情,几乎肯定是因为这是第一种方式我得到它的工作! 请帮忙!

代码:

from selenium import webdriver 
 

 
#Guess BER Number 
 
for i in range(108053983,108053985): 
 
    try:  
 
#  ber_try = 100000000 
 
     ber_try =+i 
 
#Open page & insert BER Number 
 
     browser = webdriver.Firefox() 
 
     type(browser) 
 
     browser.get('https://ndber.seai.ie/pass/ber/search.aspx') 
 
     ber_send = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_dfSearch_txtBERNumber') 
 
     ber_send.send_keys(ber_try) 
 
     
 
#click search 
 
     form = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_dfSearch_Bottomsearch') 
 
     form.click() 
 
     
 

 
#click intermediate page 
 
     form = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_gridRatings_gridview_ctl02_ViewDetails') 
 
     form.click() 
 
       
 
#scrape the page 
 
     import bs4 
 
     
 
    
 
     
 
     
 
     soup = bs4.BeautifulSoup(browser.page_source) 
 
     
 
     
 
     # First Section 
 
     ber_dec = soup.find('fieldset', {'id':'ctl00_DefaultContent_BERSearch_fsBER'}) 
 
     
 
     
 
     address = ber_dec.find('div', {'id':'ctl00_DefaultContent_BERSearch_dfBER_div_PublishingAddress'}) 
 
     address = (address.get_text(', ').strip()) 
 
     print(address) 
 
     
 
     
 
     date_issue = ber_dec.find('span', {'id':'ctl00_DefaultContent_BERSearch_dfBER_container_DateOfIssue'}) 
 
     date_issue = date_issue.get_text().strip() 
 
     print(date_issue) 
 
     
 
    except: 
 
     print('Invalid BER Number:', ber_try) 
 
     browser.quit() 
 
    
 
     
 
    #connecting to mysql  
 

 
    
 
    finally: 
 
      import pymysql.cursors 
 
      from pymysql import connect, err, sys, cursors 
 
     
 
    #Making the connection 
 
      connection = pymysql.connect(host = '127.0.0.1', 
 
             port = 3306, 
 
             user = 'root', 
 
             passwd = 'root11', 
 
             db = 'simple_scrape', 
 
             cursorclass=pymysql.cursors.DictCursor); 
 

 
      with connection.cursor() as cursor: 
 
       sql= """CREATE TABLE `simple3`(
 
       (
 
       `ID` INT AUTO_INCREMENT NOT NULL, 
 
       `address` VARCHAR(200) NOT NULL, 
 
       `date_issue` VARCHAR(200) NOT NULL, 
 
       
 
       PRIMARY KEY (`ID`) 
 
      )Engine = MyISAM)""" 
 
     
 
       sql = "INSERT INTO `simple3` (`address`, `date_issue`) VALUES (%s, %s)" 
 
       cursor.execute(sql, (address, date_issue)) 
 
      connection.commit() 
 
    finally: 
 
      connection.close() 
 
    
 
    browser.quit() 
 
    

回答

1

问题: 而实际上创建表

  sql= """CREATE TABLE simple3(
      (
      ID INT AUTO_INCREMENT NOT NULL, 
      address VARCHAR(200) NOT NULL, 
      date_issue VARCHAR(200) NOT NULL, 

      PRIMARY KEY (ID) 
     )Engine = MyISAM)""" 
// Added this line since your table was not being created. 
      cursor.execute(sql) 

      sql = "INSERT INTO simple3 (address, date_issue) VALUES (%s, %s)" 
      cursor.execute(sql, (address, date_issue)) 
+0

非常感谢回去我,但是当我做(我复制并粘贴,以确保我没有错过任何东西)我得到以下错误:'行74 sql =“CREATE TABLE'simple3'( ^ SyntaxError:扫描字符串文字时的EOL' –

+1

删除后引号(请参阅编辑版本)。除非您在表或列名称中使用空格(不提倡),否则后面的引号对MySQL不是必需的。 –

+2

如果你在一行中分割一个字符串(即'''“''''''''''),可以使用三个引号。 – ChrisP