2017-06-14 96 views
-1

我试图从Google趋势URL获取JSON,但我无法将其转换为JSON,因为内容为b''。我怎样才能得到这个结果作为JSON?Python请求二进制内容

我简单的代码:与

import requests 
r = requests.get('https://trends.google.ru/trends/api/stories/latest?hl=ru&tz=-180&cat=all&fi=15&fs=15&geo=RU&ri=300&rs=15&sort=0') 
print(r.content) 

r.content开始:

b')]}\'\n{"featuredStoryIds":[],"trendingStoryIds":["RU_lnk_iJ8H1AAwAACP-M_ru","RU_lnk_7H7L0wAwAAAnHM_ru","RU_lnk_Q-IB1AAwAABChM_ru","RU_lnk_EErj0wAwAADzKM_ru","RU_lnk_VY2s0wAwAAD57M_ru","RU_lnk_sdUP1AAwAAC-sM_ru","RU_lnk_ILv60wAwAADa2M_ru","RU_lnk_O6j70wAwAADAyM_ru","RU_lnk_fVQS1AAwAABvMM_ru","RU_lnk_TJ8D1AAwAABP-M_ru","RU_lnk_I97F0wAwAADmvM_ru","RU_lnk_tCrq0wAwAABeSM_ru","RU_lnk_W8EA1AAwAABbpM_ru","RU_lnk_IYX90wAwAADc5M_ru","RU_lnk_bz4M1AAwAABjWM_ru","RU_lnk_EJ-...

解码这与r.json()方法失败:

simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0) 
+0

'r.content'确实是原始二进制数据。你有没有看过['response.json()'方法](http://docs.python-requests.org/en/master/user/quickstart/#json-response-content)?你打电话时会发生什么? –

+0

是的,simplejson.scanner.JSONDecodeError:期望值:第1行第1列(char 0) –

回答

2

你联系谷歌的服务,谷歌在JSON前加上一些额外的数据到prevent JSON hijacking

>>> import requests 
>>> r = requests.get('https://trends.google.ru/trends/api/stories/latest?hl=ru&tz=-180&cat=all&fi=15&fs=15&geo=RU&ri=300&rs=15&sort=0') 
>>> r.content[:10] 
b')]}\'\n{"fea' 

请注意)]}'和换行符在开头。

您需要先删除这些额外的数据并手动解码;有在有效载荷没有其他换行符,所以我们只能分割的换行符:

import json 

json_body = r.text.splitlines()[-1] 
json_data = json.loads(json_body) 

我用Response.text这里得到解码的字符串数据(服务器设置在头部正确的内容类型的编码)。

这给你一个解码词典:

>>> json_body = r.text.splitlines()[-1] 
>>> json_data = json.loads(json_body) 
>>> type(json_data) 
<class 'dict'> 
>>> sorted(json_data) 
['date', 'featuredStoryIds', 'hideAllImages', 'storySummaries', 'trendingStoryIds'] 
+0

TypeError:JSON对象必须是str,而不是'bytes' –

+0

@KonstantinRusanov:啊,老的Python 3版本,3.6接受字节。将更新。 –

+0

完美!非常感谢 –

-1

也许尝试这一点,它可能会帮助:

import requests 
    r = requests.get('https://trends.google.ru/trends/api/stories/latest?hl=ru&tz=-180&cat=all&fi=15&fs=15&geo=RU&ri=300&rs=15&sort=0') 
    page=r.status_code 
    print page