2017-07-27 41 views
0

我想写一个程序,从Excel电子表格中抽取属性ID,根据这些ID“Web刮擦”相关属性值的网页导航到一个网页,将它们导入到同一电子表格中。我事先道歉,我是一个非常新手的python(或任何语言tbh)编码器。这是迄今为止代码:Web Scraper:request.get被重定向到不同的网页

import xlrd 
from lxml import html 
import requests 

class Estimate: 
    def importo(self): 
    # access excel spreadsheet 
    file_location = "S:\Powerdel\Transmission Engineering\Miscellaneous\Estimates\Auto_Estimator\Estimate_Output.xls" 
    workbook = xlrd.open_workbook(file_location) 
    sheet = workbook.sheet_by_index(0) 

    # import number of columns from spreadsheet 
    n = int(sheet.nrows) 

    #initalize lists 
    id = [0] * (n - 1) 
    width = [0] * (n - 1) 
    cost = [0] * (n - 1) 
    size = [0] * (n - 1) 

    # import values from spreadsheet 
    for row in range(n-1): 
     id[row] = sheet.cell_value(row+1,3) 
     width[row] = sheet.cell_value(row+1,1) 

    #grab cost from webpage 
    #for row in range (n-1): 
    name = "http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id={0}" .format(id[0]) 
    page = requests.get(name) 
    tree = html.fromstring(page.text) 
    cost[0] = tree.xpath('//div[@id="landDetails"]/table/tbody/tr[2]/td[5]/text()') 

    print(id[0]) 
    print(width[4]) 
    print(n) 
    print(cost[0]) 
    print(name) 
    print(tree.text_content().encode('utf-8')) 

Estimate().importo()" 

而且结果:

337776 
492.0 
63 
[] 
http://propaccess.traviscad.org/clientdb/Property.aspx?prop_id=337776 
Travis Property Search 
    body { text-align: center; padding: 150px; } 
    h1 { font-size: 50px; } 
    body { font: 20px Helvetica, sans-serif; color: #333; } 
    #article { display: block; text-align: left; width: 650px; margin: 0 auto; } 
    a { color: #dc8100; text-decoration: none; } 
    a:hover { color: #333; text-decoration: none; } 


Please try again 

    Sorry for the inconvenience but your session has either timed out or the server is busy handling other requests. You may visit us on the the following website for information, otherwise please retry your search again shortly:Travis Central Appraisal District Website 
    Click here to reload the property search to try again 

我的问题(现在)是,我是request.GET中从得到预期的网站重定向。有趣的是,如果我遵循链接,我的程序会在我运行后打印出来,我会重定向到相同的道歉。 Buuut,如果我通过traviscad.org网站上的菜单项导航到预期的网页,然后按照我的打印链接,繁荣,正确的网站。

就像我说的,我是全新的,所以我不知道为什么我会重定向或如何防止它。如果您有任何建议,请告诉我!

The desired webpage The bogus redirect

+0

当我点击所需的网页链接时,我得到了与伪造重定向相同的结果,我认为该问题可能与访问页面直接而不是通过tabbs –

回答