2017-07-02 83 views
-1

我有一个形式的字典;分裂熊猫列与元组

data = {A:[(1,2),(3,4),(5,6),(7,8),(8,9)], 
     B:[(3,4),(4,5),(5,6),(6,7)], 
     C:[(10,11),(12,13)]} 

创建由数据帧:

df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in data.iteritems()])) 

这又成为;

A  B  C 
(1,2) (3,4) (10,11) 
(3,4) (4,5) (12,13) 
(5,6) (5,6) NaN 
(6,7) (6,7) NaN 
(8,9) NaN NaN 

有没有办法从数据框上面去下面的一个:

A  B  C 
one two one two one two 
1 2 3 4 10 11 
3 4 4 5 12 13 
5 6 5 6 NaN NaN 
6 7 6 7 NaN NaN 
8 9 NaN NaN NaN NaN 
+1

[拆分元组在一个列表的可能的复制熊猫dataframe列](https://stackoverflow.com/questions/31069018/splitting-a-list-of-tuples-in-a-pandas-dataframe-列) – Wen

+1

@idjaw你是对的我的问题写得不是很好,我希望我的编辑更好地解释它。 – user3191569

+0

@Wen你提到的问题拆分创建两个完全不同的列,在我的情况下,我想使用多索引 – user3191569

回答

1

您可以使用list comprehensionDataFrame构造与values + tolistconcat列转换为numpy array

cols = ['A','B','C'] 
L = [pd.DataFrame(df[x].values.tolist(), columns=['one','two']) for x in cols] 
df = pd.concat(L, axis=1, keys=cols) 
print (df) 

    A  B  C  
    one two one two one two 
0 1 2 3 4 5 6 
1 7 8 9 10 11 12 
2 13 14 15 16 17 18 

编辑:

dict comprehension类似溶液,integer价值观转化为float S,由于NaNtypefloat太。

data = {'A':[(1,2),(3,4),(5,6),(7,8),(8,9)], 
     'B':[(3,4),(4,5),(5,6),(6,7)], 
     'C':[(10,11),(12,13)]} 

cols = ['A','B','C'] 
d = {k: pd.DataFrame(v, columns=['one','two']) for k,v in data.items()} 
df = pd.concat(d, axis=1) 
print (df) 
    A  B   C  
    one two one two one two 
0 1 2 3.0 4.0 10.0 11.0 
1 3 4 4.0 5.0 12.0 13.0 
2 5 6 5.0 6.0 NaN NaN 
3 7 8 6.0 7.0 NaN NaN 
4 8 9 NaN NaN NaN NaN 

编辑:

对于由一个列中的多个能够使用slicers

s = df[('A', 'one')] 
print (s) 
0 1 
1 3 
2 5 
3 7 
4 8 
Name: (A, one), dtype: int64 

df.loc(axis=1)[:, 'one'] = df.loc(axis=1)[:, 'one'].mul(s, axis=0) 
print (df) 
     A   B   C  
    one two one two one two 
0 1.0 2 3.0 4.0 10.0 11.0 
1 9.0 4 12.0 5.0 36.0 13.0 
2 25.0 6 25.0 6.0 NaN NaN 
3 49.0 8 42.0 7.0 NaN NaN 
4 64.0 9 NaN NaN NaN NaN 

另一种解决方案:

idx = pd.IndexSlice 
df.loc[:, idx[:, 'one']] = df.loc[:, idx[:, 'one']].mul(s, axis=0) 
print (df) 
     A   B   C  
    one two one two one two 
0 1.0 2 3.0 4.0 10.0 11.0 
1 9.0 4 12.0 5.0 36.0 13.0 
2 25.0 6 25.0 6.0 NaN NaN 
3 49.0 8 42.0 7.0 NaN NaN 
4 64.0 9 NaN NaN NaN NaN 
+0

非常感谢你,想知道有没有办法访问特定的列,即一个数据帧并在它们全部上广播计算,即对第一列中的所有值乘以1 – user3191569

+0

给我一些时间。 – jezrael

+0

你是否认为'df [('A','one')]'是多重的? – jezrael