我试图删除这个问题 - 但第二个想法我会保留它 - 这是一个现场演示,作为开发人员,我应该更加注意细节curl和python请求库的奇怪行为
我想从网站获取一些数据。请求的URL将查看请求的内容类型,然后相应地作出响应。
所以curl命令我想:
curl --header "Accept: application/json, text/javascript, */*; q=0.01\r\nX-Requested-With: XMLHttpRequest\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\n" http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/margin_bal_result.php\?l\=en-us\&d\=2016/11/15\&_\=1479700586981 -v
* About to connect() to www.tpex.org.tw port 80 (#0)
* Trying 210.63.162.130... connected
> GET /web/stock/margin_trading/margin_balance/margin_bal_result.php?l=en-us&d=2016/11/15&_=1479700586981 HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: www.tpex.org.tw
> Accept: application/json, text/javascript, */*; q=0.01\r\nX-Requested-With: XMLHttpRequest\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\nAccept-Encoding: gzip,deflate,sdch\r\n
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Date: Mon, 21 Nov 2016 07:35:56 GMT
< Server: Apache
< Content-Type: text/html; charset=utf-8
< X-Cache: MISS from localhost
< X-Cache-Lookup: MISS from localhost:3128
< Via: 1.0 localhost (squid/3.1.19)
< Connection: close
<
{"reportDate":"2016\/11\/15","iTotalRecords":610,"aaData":[["006201","YA HORNG ELECTRONIC CO.","6","0","0","0","6","0","0.09","6,361","0","0","0","0","0","0","0.0","6,361","0",""],...}
响应被截断,但基本上它是JSON。
但是,有我的Python代码,我不认为有太大的区别。但响应的HTML ...
g_tpex_headers = {
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'User-Agent': (
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
' (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120'
' Chrome/37.0.2062.120 Safari/537.36'
),
'X-Requested-With': 'XMLHttpRequest',
}
data_link = (
'http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/'
'margin_bal.php?l=en-us&d={}&_=1479700586981'
)
data = []
with requests.Session() as session:
session.headers = g_tpex_headers
res = session.get(
actual_data_link.format(target_dt.strftime('%Y/%m/%d'))
)
print(res.content[:400])
日志:
send: 'GET /web/stock/margin_trading/margin_balance/margin_bal.php?l=en-us&d=2016/11/18&_=1479700586981 HTTP/1.1\r\nHost: www.tpex.org.tw\r\nX-Requested-With: XMLHttpRequest\r\nAccept-Encoding: gzip,deflate,sdch\r\nAccept: application/json, text/javascript, */*; q=0.01\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\n\r\n'
和响应
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title> HOME > Mainboard > Margin Trading > Margin Balance</title>
<link rel="icon" type="image/ico" href="/web/images/favicon.ic
我看不出太大的区别。那么为什么python请求没有得到JSON响应。
感谢您的帮助,我正在如此粗心大意,以至于我无法回避 –
我有一个疑问,对于卷曲,请注意有一行表示用户代理卷曲 - 该行不会被发送到服务器上吗? –
@JunchaoGu关于cURL使用的更新结果 – niemmi