2014-12-02 137 views
1

Python新的Python。我试图建立一个僵尸程序,可以在利用aspx搜索表单的网站上执行搜索,我试图搜索表单并将结果保存到文件中。Python .aspx搜索表单结果问题

这里是我的脚本:

import urllib 
from bs4 import BeautifulSoup 
import urllib.request 
from urllib.request import urlopen 


headers = { 
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17', 
'Content-Type': 'application/x-www-form-urlencoded', 
'Accept-Encoding': 'gzip,deflate,sdch', 
'Accept-Language': 'en-US,en;q=0.8', 
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3' 
} 

class MyOpener(urllib.request.FancyURLopener): 
version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17' 

myopener = MyOpener() 

url = 'http://legistar.council.nyc.gov/Legislation.aspx' 
# first HTTP request without form data 
f = myopener.open(url) 
soup = BeautifulSoup(f) 

lastfocus = soup.select("#__LASTFOCUS")[0]['value'] 
eventtarget = soup.select("#__EVENTTARGET")[0]['value'] 
eventargument = soup.select("#__EVENTARGUMENT")[0]['value'] 
viewstate = soup.select("#__VIEWSTATE")[0]['value'] 

formFields = (
    (r'__LASTFOCUS', lastfocus), 
    (r'__EVENTTARGET', eventtarget), 
    (r'__EVENTARGUMENT', eventargument), 
    (r'__VIEWSTATE', viewstate), 
    (r'ctl00_RadScriptManager1_TSM', ''), 
    (r'ctl00_tabTop_ClientState', ''), 
    (r'ctl00_ContentPlaceHolder1_menuMain_ClientState', ''), 
    (r'ctl00_ContentPlaceHolder1_gridMain_ClientState', ''), 
                # Check boxes 
    (r'ctl00$ContentPlaceHolder1$chkID', 'on'), # file number 
    (r'ctl00$ContentPlaceHolder1$chkText', 'on'), # Legislative text 
    (r'ctl00$ContentPlaceHolder1$chkAttachments', 'on'), # attachement 
                # etc. (not all listed) 
    (r'ctl00$ContentPlaceHolder1$txtSearch', 'york'), # Search text 
    (r'ctl00$ContentPlaceHolder1$lstYears', '2014'), # Years to include 
    (r'ctl00$ContentPlaceHolder1$lstTypeBasic', 'All Types'), #types to include 
    (r'ctl00$ContentPlaceHolder1$btnSearch', 'Search Legislation') # Search button itself 
) 

encodedFields = urllib.parse.urlencode(formFields) 
# second HTTP request with form data 
f = myopener.open(url, encodedFields) 

try: 
    # actually we'd better use BeautifulSoup once again to 
    # retrieve results(instead of writing out the whole HTML file) 
    # Besides, since the result is split into multipages, 
    # we need send more HTTP requests 
    fout = open('tmp.html', 'wb') 
except: 
    print('Could not open output file\n') 
fout.writelines(f.readlines()) 
fout.close() 

它没有任何错误执行。但是当我打开tmp.html文件时,我没有看到在实际网站上显示的结果。

这些结果如下:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org  /TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head><title> 
    Error 
</title></head> 
<body> 
<form name="form1" method="post" action="Error.aspx" id="form1"> 
<div> 
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE"  value="ND1u0lOZH65sNTWWoa6wLYsEtU6yeI938ytDgbd2dC167Gk8a/1RonXoednpTu74caJ8DocoE4ewDkNe6u02VlFhiTlr5MevcRRE7CVvClRleCWGYiPME3cqJWvjA8uv" /> 
</div> 

<div> 

    <input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="AB827D4F" /> 
</div> 
    <div> 
     <h2> 
     Server Error</h2> 
     <h4> 
      The server encountered a temporary error and could not complete your  request.</h4> 
     <h4> 
      Please <a href="Default.aspx">try again</a> in 30 seconds.</h4> 
    </div> 
    </form> 
</body> 
</html> 

如何使脚本返回,我寻找的结果?

任何形式的帮助,非常感谢。

回答

1

此代码完美工作。

from selenium import webdriver 
driver = webdriver.Firefox() 
driver.get("http://legistar.council.nyc.gov/Legislation.aspx") 
# Alternatively, link directly to the form: 
# driver.get("https://www.icsi.in/student/Members/MemberSearch.aspx?SkinSrc=%5BG%5DSkins/IcsiTheme/IcsiIn-Bare&ContainerSrc=%5BG%5DContainers/IcsiTheme/NoContainer") 

# Locate the elements. 
first = driver.find_element_by_id("ctl00_ContentPlaceHolder1_txtSearch") 
search = driver.find_element_by_id("ctl00_ContentPlaceHolder1_btnSearch") 

# Input the data and click submit. 
first.send_keys("York") 
search.click() 
+0

好,不知道你在这里和硒有关。感谢分享。 – alecxe 2015-01-06 13:05:04