pd.melt
可以将多个列合并为一个值列(和一个可变列)。你可以使用它曾经凝聚了num1
和num2
列,和第二次聚结phone1
和phone2
列:
import pandas as pd
df = pd.DataFrame({'phone1':[4567890876, 4567890876, 9178889999, 3237800876],
'phone2':[4567890876, 4567890876, 9178889999, 2139990000],
'num1':[1,2,3,3],
'num2':[5,2,3,1]})
melted = pd.melt(df, id_vars=['phone1', 'phone2'], var_name='numvar', value_name='num')
melted = pd.melt(melted, id_vars=['numvar', 'num'], value_name='phone')
melted = melted[['num', 'phone']]
melted = melted.drop_duplicates()
print(melted)
产生
num phone
0 1 4567890876
1 2 4567890876
2 3 9178889999
3 3 3237800876
4 5 4567890876
7 1 3237800876
11 3 2139990000
15 1 2139990000
说明:使用id_vars
到防止phone1
和phone2
色谱柱熔化。下面显示熔化num1
和num2
列结果:
In [166]: melted = pd.melt(df, id_vars=['phone1', 'phone2'], var_name='numvar', value_name='num'); melted
Out[166]:
phone1 phone2 numvar num
0 4567890876 4567890876 num1 1
1 4567890876 4567890876 num1 2
2 9178889999 9178889999 num1 3
3 3237800876 2139990000 num1 3
4 4567890876 4567890876 num2 5
5 4567890876 4567890876 num2 2
6 9178889999 9178889999 num2 3
7 3237800876 2139990000 num2 1
然后再次申请pd.melt
到phone1
和phone2
列合并为一个:
In [168]: pd.melt(melted, id_vars=['numvar', 'num'], value_name='phone')
Out[168]:
numvar num variable phone
0 num1 1 phone1 4567890876
1 num1 2 phone1 4567890876
2 num1 3 phone1 9178889999
3 num1 3 phone1 3237800876
4 num2 5 phone1 4567890876
5 num2 2 phone1 4567890876
6 num2 3 phone1 9178889999
7 num2 1 phone1 3237800876
8 num1 1 phone2 4567890876
9 num1 2 phone2 4567890876
10 num1 3 phone2 9178889999
11 num1 3 phone2 2139990000
12 num2 5 phone2 4567890876
13 num2 2 phone2 4567890876
14 num2 3 phone2 9178889999
15 num2 1 phone2 2139990000
删除重复项,并删除numvar
和variable
列你会得到想要的结果(尽管顺序不同)。
为什么'2139990000'和'3237800876'在结果DF中出现两次? – MaxU