2010-09-07 221 views
22

我正在尝试发出POST请求来检索有关图书的信息。 这里是返回HTTP代码代码:302,感动进行HTTP POST请求

import httplib, urllib 
params = urllib.urlencode({ 
    'isbn' : '9780131185838', 
    'catalogId' : '10001', 
    'schoolStoreId' : '15828', 
    'search' : 'Search' 
    }) 
headers = {"Content-type": "application/x-www-form-urlencoded", 
      "Accept": "text/plain"} 
conn = httplib.HTTPConnection("bkstr.com:80") 
conn.request("POST", "/webapp/wcs/stores/servlet/BuybackSearch", 
      params, headers) 
response = conn.getresponse() 
print response.status, response.reason 
data = response.read() 
conn.close() 

当我从一个浏览器试试,从这个页面:http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackMaterialsView?langId=-1&catalogId=10001&storeId=10051&schoolStoreId=15828,它的工作原理。我在代码中缺少什么?

编辑: 这是我得到的时候我打电话打印response.msg

302 Moved Date: Tue, 07 Sep 2010 16:54:29 GMT 
Vary: Host,Accept-Encoding,User-Agent 
Location: http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch 
X-UA-Compatible: IE=EmulateIE7 
Content-Length: 0 
Content-Type: text/plain; charset=utf-8 

似乎位置指向同一个URL我试图访问摆在首位?

EDIT2:

我使用的urllib2这里建议尝试。下面是代码:

import urllib, urllib2 

url = 'http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch' 
values = {'isbn' : '9780131185838', 
      'catalogId' : '10001', 
      'schoolStoreId' : '15828', 
      'search' : 'Search' } 


data = urllib.urlencode(values) 
req = urllib2.Request(url, data) 
response = urllib2.urlopen(req) 
print response.geturl() 
print response.info() 
the_page = response.read() 
print the_page 

这里是输出:

http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch 
Date: Tue, 07 Sep 2010 16:58:35 GMT 
Pragma: No-cache 
Cache-Control: no-cache 
Expires: Thu, 01 Jan 1970 00:00:00 GMT 
Set-Cookie: JSESSIONID=0001REjqgX2axkzlR6SvIJlgJkt:1311s25dm; Path=/ 
Vary: Accept-Encoding,User-Agent 
X-UA-Compatible: IE=EmulateIE7 
Content-Length: 0 
Connection: close 
Content-Type: text/html; charset=utf-8 
Content-Language: en-US 
Set-Cookie: TSde3575=225ec58bcb0fdddfad7332c2816f1f152224db2f71e1b0474c866f3b; Path=/ 
+0

302响应还表明它被移动到哪里 - 找到该URL并使用它。 – adamk 2010-09-07 14:39:41

回答

26

他们的服务器似乎希望你获得正确的cookie。这工作:

import urllib, urllib2, cookielib 

cookie_jar = cookielib.CookieJar() 
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar)) 
urllib2.install_opener(opener) 

# acquire cookie 
url_1 = 'http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackMaterialsView?langId=-1&catalogId=10001&storeId=10051&schoolStoreId=15828' 
req = urllib2.Request(url_1) 
rsp = urllib2.urlopen(req) 

# do POST 
url_2 = 'http://www.bkstr.com/webapp/wcs/stores/servlet/BuybackSearch' 
values = dict(isbn='9780131185838', schoolStoreId='15828', catalogId='10001') 
data = urllib.urlencode(values) 
req = urllib2.Request(url_2, data) 
rsp = urllib2.urlopen(req) 
content = rsp.read() 

# print result 
import re 
pat = re.compile('Title:.*') 
print pat.search(content).group() 

# OUTPUT: Title:&nbsp;&nbsp;Statics & Strength of Materials for Arch (w/CD)<br /> 
+0

它确实有效!非常感谢你! – infrared 2010-09-07 21:39:16

+6

@infrared:很高兴帮助。我可能应该补充说,解决这些类型的一种方法是运行一个HTTP代理,它向您显示请求/响应的跟踪。然后,使用浏览器和您的代码,并比较两条痕迹。通常,您正在寻找cookie或标头之间的差异。有时需要一些试验和错误。我喜欢使用Fiddler,但任何这样的工具都可以。 – ars 2010-09-08 07:40:19

1
  1. 也许这就是浏览器获得什么,你就得跟着302重定向。

  2. 如果一切都失败了,您可以使用FireBug或tcpdump或wireshark监视Firefox和Web服务器之间的对话,并查看哪些HTTP标头不同。可能它只是User Agent:标题。