2017-03-01 89 views
2

我有一个熊猫的数据帧与列“A”熊猫拆分数据帧一列,得到标题

dfc = pd.DataFrame({"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']}) 

我想拆塔“A”的数据帧,并获得新的数据-frame等,

A AB ABP AC AF AN AO 
0 AB=0.246154;ABP=39.3908;AC=3 0.246154 39.3908 3 None None None 
1 AB=0.3;ABP=9.95901;AC=2;AF=0.333333 0.3 9.95901 2 0.333333 None None 
2 AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86 0 0 6 1 6 86 
3 AB=0.461538;ABP=3.51141;AC=2 0.461538 3.51141 2 None None None 

我试图使用分割数据帧列,

dfc.A.str.split(';', expand = True) 

但它提供了新的数据帧等,

   0   1  2   3  4  5 
0 AB=0.246154 ABP=39.3908 AC=3   None None None 
1  AB=0.3 ABP=9.95901 AC=2 AF=0.333333 None None 
2   AB=0  ABP=0 AC=6   AF=1 AN=6 AO=86 
3 AB=0.461538 ABP=3.51141 AC=2   None None None 

如何将标题添加到列中“=”之前的文本并将此新数据框添加到原始数据框? 是否有Pythonic方式在一行中执行这两个操作?

由于

回答

2

使用extractall

e = dfc.A.str.extractall('([^;]+)=([^;]+)') 
pd.Series(e.values[:, 1], [e.index.get_level_values(0), e.values[:, 0]]).unstack() 

     AB  ABP AC  AF AN AO 
0 0.246154 39.3908 3  None None None 
1  0.3 9.95901 2 0.333333 None None 
2   0  0 6   1  6 86 
3 0.461538 3.51141 2  None None None 
0

这应该工作:

d = {"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']} 
rows = [s.split(";") for s in d["A"]] 
data = [dict(cell.split('=') for cell in row) for row in rows] 

df = pd.DataFrame(data) 
print (df) 

d = {"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']} 
dfc = pd.DataFrame(d) 

f = lambda s : dict(cell.split('=') for cell in s.split(';')) 
df = pd.DataFrame(dfc.A.apply(f).tolist()) 
print (df) 

输出:

  AB  ABP AC  AF AN AO 
0 0.246154 39.3908 3  NaN NaN NaN 
1  0.3 9.95901 2 0.333333 NaN NaN 
2   0  0 6   1 6 86 
3 0.461538 3.51141 2  NaN NaN NaN 
4

尝试下文中,构造一个系列/字典中的每个元素科拉姆N A适当地分割后的字符串,索引/键将成为结果(使用pd.concat来连接的原始列A与新的数据帧,如果需要的话)的报头:

dfc.A.apply(lambda x: pd.Series(dict(s.split("=") for s in x.split(";")))) 

#   AB  ABP AC  AF  AN AO 
#0 0.246154 39.3908 3  NaN NaN NaN 
#1  0.3 9.95901 2 0.333333 NaN NaN 
#2   0   0 6   1  6 86 
#3 0.461538 3.51141 2  NaN NaN NaN 
0
def spliter(data): 
    pairs = [x.split("=") for x in data.split(";")] 
    return pd.Series({key: val for key, val in pairs}) 


dfc.A.apply(spliter) 


     AB  ABP AC  AF AN AO 
0 0.246154 39.3908 3  NaN NaN NaN 
1  0.3 9.95901 2 0.333333 NaN NaN 
2   0  0 6   1 6 86 
3 0.461538 3.51141 2  NaN NaN NaN