我需要抓取使用Scrapy与cookie的网站,但返回错误这里HTTP状态代码不被处理或者不允许
代码
class XueqiuSpider(scrapy.Spider):
name = "xueqiu"
start_urls = ["https://xueqiu.com/stock/f10/finmainindex.json?symbol=SZ000001&page=1&size=1"]
delimiter = ','
quotechar = '"'
headers = ["symbol","date","open","high","low","close","volume"]
def start_requests(self):
for i,url in enumerate(self.start_urls):
print(url)
yield Request(url,cookies={'aliyungf_tc':'AQAAANiAQ3xQ/QAAZ0J2fRFnxcJufEzG'},callback=self.parse_item)
def parse_item(self, response):
print response
错误显示
********Current UserAgent:Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11************
2017-03-02 18:56:02 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET https://xueqiu.com/stock/f10/finmainindex.json?symbol=SZ000001&page=1&size=1>
Cookie: aliyungf_tc=AQAAANiAQ3xQ/QAAZ0J2fRFnxcJufEzG; aliyungf_tc=AQAAAM/c+1g5vAMAZ0J2fbusPyBy7jb1
2017-03-02 18:56:02 [scrapy.core.engine] DEBUG: Crawled (400) <GET https://xueqiu.com/stock/f10/finmainindex.json?symbol=SZ000001&page=1&size=1> (referer: None)
2017-03-02 18:56:02 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <400 https://xueqiu.com/stock/f10/finmainindex.json?symbol=SZ000001&page=1&size=1>: HTTP status code is not handled or not allowed
从哪里你想凑数据添加呢?我知道它的xueqiu.com网站,但哪一部分?你能发布实际的链接吗?我100%肯定你错过了一些额外的请求标题,这就是为什么你得到400错误。 – Umair
这里的链接,https://xueqiu.com/stock/f10/finmainindex.json?symbol = SZ000001&page = 1&size = 1 – Flasking
不,我不是问那个URL ...... :(确定无论如何,你得到的URL是?这是一个AJAX网址?或者什么? – Umair