快译通的df，

我有一个列表的字典，像这样：快译通的df，

{291840: ['http://www.rollanet.org', 'http://www.rollanet.org'], 
    291841: ['http://www.superpages.com', 'http://www.superpages.com], 
    291848: ['http://www.drscore.com/App/ScoreDr', 'http://www.drscore.com'],...etc }

我想将其转换为一个两列的数据帧，一个为subj_id，另一个用于相应的列表。每行将是字典的关键字，列是使用from_dict的东西设置为索引的值（列表）。根据文件：“东方：如果键应该是行，通过'索引'。”

names = ['subj_id', 'URLs'] 

dfDict = pd.DataFrame(columns = names) 
dfDict.from_dict(listDict, orient = 'index')

相反，我得到一个数据框，列表中的每个元素作为列。我只想要两列。一个用于subj_ID，另一个用于与subj_ID关联的URL列表。

来源

2016-11-16 pproctor

我想你需要：

listDict = {291840: ['http://www.rollanet.org', 'http://www.rollanet.org'], 
    291841: ['http://www.superpages.com', 'http://www.superpages.com'], 
    291848: ['http://www.drscore.com/App/ScoreDr', 'http://www.drscore.com']} 

names = ['subj_id', 'URLs'] 

df = pd.DataFrame(listDict).stack().reset_index(drop=True, level=0).reset_index() 
df.columns = names 
print (df) 
    subj_id        URLs 
0 291840    http://www.rollanet.org 
1 291841   http://www.superpages.com 
2 291848 http://www.drscore.com/App/ScoreDr 
3 291840    http://www.rollanet.org 
4 291841   http://www.superpages.com 
5 291848    http://www.drscore.com

老答案：

df = pd.DataFrame.from_dict(listDict, orient='index').stack().reset_index(drop=True, level=1)

如果列清单需要使用URLslist comprehensions：

df = pd.DataFrame({'subj_id': pd.Series([k for k,v in listDict.items()]), 
        'URLs': pd.Series([v for k,v in listDict.items()])}, columns = names) 
print (df) 
    subj_id            URLs 
0 291840 [http://www.rollanet.org, http://www.rollanet.... 
1 291841 [http://www.superpages.com, http://www.superpa... 
2 291848 [http://www.drscore.com/App/ScoreDr, http://ww...

来源

2016-11-16 07:21:37 jezrael

在我的情况下，每一个URL列表中的大小不同。所以用你的代码我得到了错误“ValueError：数组必须都是相同的长度” – pproctor

jezreals以前（删除）的答案适用于不同的大小： pd.DataFrame.from_dict（listDict，orient ='index'）。stack（）。 reset_index（drop = True，level = 1） – Skirrebattie

@Sirirrebattie - 谢谢，我添加了旧的答案 – jezrael

因为我来不及给jezrael的答案，这是一个有趣的方式：

pd.concat([pd.Series(v, [k] * len(v)) for k, v in listDict.items()]) \ 
    .rename_axis('subj_id').reset_index(name='urls')

来源

2016-11-16 07:46:01 piRSquared

这是一个潜在的方式来做到这一点，但为了我的目的，我想将每个URL列表保存在一起以获取其各自的subj_id – pproctor

快译通的df，

回答

相关问题