2017-01-18 12 views
3

有人可以请我告诉我如何遍历数据框列中的多个值。在一列中循环多个值

例子:

col1 col2 
High street qwe.723,qwe.2,qwe.17,qwe.1000,qwe.23 
Must street qwe.34,qwe.17,qwe.1000,qwe.23 

我想有以下输出:

High street 
qwe.723 
High street 
qwe.2 
High street 
qwe.17 
High street 
qwe.1000 
High street 
qwe.23 

Must street 
qwe.34 
Must street 
qwe.17 
Must street 
qwe.1000 
Must street 
qwe.23 

我尝试:

lines = open('file.txt','r') 
for line in lines: 
    line=line.strip().split('\t') 
    vals=line[1].split(',') 
    for val in vals: 
     print(line[0],'\n',val) 

回答

4

试试这个:

In [136]: df 
Out[136]: 
      col1         col2 
0 High street qwe.723,qwe.2,qwe.17,qwe.1000,qwe.23 
1 Must street   qwe.34,qwe.17,qwe.1000,qwe.23 

In [137]: df.set_index('col1').col2.str.split(',', expand=True).stack().reset_index(level=1, drop=1).to_frame('col2').reset_index().stack() 
    ...: 
Out[137]: 
0 col1 High street 
    col2  qwe.723 
1 col1 High street 
    col2   qwe.2 
2 col1 High street 
    col2   qwe.17 
3 col1 High street 
    col2  qwe.1000 
4 col1 High street 
    col2   qwe.23 
5 col1 Must street 
    col2   qwe.34 
6 col1 Must street 
    col2   qwe.17 
7 col1 Must street 
    col2  qwe.1000 
8 col1 Must street 
    col2   qwe.23 
dtype: object 

我敢肯定,必须有一个更好的方式来做到这一点...

4

还有一句:

(df.set_index('col1') 
    .col2.str.split(',', expand=True) 
    .stack() 
    .reset_index(level=-1, drop=True) 
    .to_csv('output.txt',sep='\n') 
3

因为我是玩弄具有和numpy
超快速的乐趣!

import cytoolz 

c2 = np.core.defchararray.split(df.col2.values.astype('str'), ',') 
col1 = df.col1.values.repeat([len(c) for c in c2.tolist()]) 
col2 = list(cytoolz.concat(c2)) 
np.stack([col1, col2]).ravel('F') 

array(['High street', 'qwe.723', 'High street', 'qwe.2', 'High street', 
     'qwe.17', 'High street', 'qwe.1000', 'High street', 'qwe.23', 
     'Must street', 'qwe.34', 'Must street', 'qwe.17', 'Must street', 
     'qwe.1000', 'Must street', 'qwe.23'], dtype=object) 

时间测试

enter image description here

+0

感谢这么多家伙漂亮的解决方案!问题解决了。 – user27976