我试图从pypi中提取pip包的许可信息,然后加载到熊猫数据框中。我之前做过一个例子,为PD加载列表解析。但我无法弄清楚这一个...将数据加载到熊猫
到目前为止,我已经写了。
from requests import get
import pandas as pd
import pip
url = 'https://pypi.python.org/pypi'
# packages_list = ['numpy','twisted']
installed_packages = pip.get_installed_distributions()
installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
for i in installed_packages])
packages = []
licenses = []
summarys = []
for index, package in enumerate(installed_packages_list):
package = package.split("==")[0]
full_url = url+'/'+ package +'/json'
#print 'url is ' + full_url
page = get(url+'/'+package+'/json').json()
#print 'Package: ' + package + ', license is:' + page['info']['license'] + '. ' + page['info']['summary']
packages.append(package)
licenses.append(page['info']['license'])
summarys.append(page['info']['summary'])
print packages
pd_packages = pd.DataFrame(
{
"packages":[packages],
"licenses":[licenses],
"summarys":[summarys]
})
print pd_packages
什么这是个问题吗? –
它显示类似于0 [MIT,,MPL-2.0,LGPL,UNKNOWN,BSD-like,BSD,... packages \ 0 [beautifulsoup4,bs4,certifi,chardet,get,i ... summarys 0 [屏幕抓取库,虚拟包是... – vkk07
我想获取这种数据在桌子的种类和转储到使用熊猫csv – vkk07