import pandas as pd
import numpy as np
import random
labels = ["c1","c2","c3"]
c1 = ["one","one","one","two","two","three","three","three","three"]
c2 = [random.random() for i in range(len(c1))]
c3 = ["alpha","beta","gamma","alpha","gamma","alpha","beta","gamma","zeta"]
DF = pd.DataFrame(np.array([c1,c2,c3])).T
DF.columns = labels
数据框的样子:熊猫:最有效的方法,使词典的词典从数据帧列
c1 c2 c3
0 one 0.440958516531 alpha
1 one 0.476439953723 beta
2 one 0.254235673552 gamma
3 two 0.882724336464 alpha
4 two 0.79817899139 gamma
5 three 0.677464637887 alpha
6 three 0.292927670096 beta
7 three 0.0971956881825 gamma
8 three 0.993934915508 zeta
我能想到做字典的唯一办法是:
D_greek_value = {}
for greek in set(DF["c3"]):
D_c1_c2 = {}
for i in range(DF.shape[0]):
row = DF.iloc[i,:]
if row[2] == greek:
D_c1_c2[row[0]] = row[1]
D_greek_value[greek] = D_c1_c2
D_greek_value
生成的词典如下所示:
{'alpha': {'one': '0.67919712421',
'three': '0.67171020684',
'two': '0.571150669821'},
'beta': {'one': '0.895090207979', 'three': '0.489490074662'},
'gamma': {'one': '0.964777504708',
'three': '0.134397632659',
'two': '0.10302290374'},
'zeta': {'three': '0.0204226923557'}}
我不想让c1来块(“one”每次都在一起)。我正在做一个几百MB的csv,我觉得我做错了。如果您有任何想法请帮助!
很不错的。我想知道这是否比我发布的更快。我希望'groupby'速度非常快,但lambda可能会减慢速度。我虽然懒得时间。 –
@StevenRumbalski:我也是。 :-)我试图看看是否可以使用矢量化操作获得相同的结果,但弹回;别人可能会有更聪明的东西。但我认为你已经把你的手指放在了一个大问题上(太多的迭代),除此之外的一切都是微不足道的。 – DSM
@DSM我知道如何使用lambda函数进行排序,但确切地说是从“.apply”到“.to_dict()”? –