我想你可以对齐数据都DataFrames第一重列:
df0 = pd.DataFrame({'ObjectType':[1,2,3],
'B':[4,5,6],
'C':[7,8,9]})
#print (df0)
df1 = pd.DataFrame({'ObjectClass':[1,2,3],
'B':[4,5,6],
'C':[7,8,9]})
#print (df1)
inputs= ["ObjectType","ObjectClass"]
output= "MyObjectClasses"
#dict comprehension
d = {x:output for x in inputs}
print (d)
{'ObjectType': 'MyObjectClasses', 'ObjectClass': 'MyObjectClasses'}
df0 = df0.rename(columns=d)
df1 = df1.rename(columns=d)
df_total = pd.concat([df0, df1], ignore_index=True)
print (df_total)
B C MyObjectClasses
0 4 7 1
1 5 8 2
2 6 9 3
3 4 7 1
4 5 8 2
5 6 9 3
编辑:
更多simplier是update
(工作inplace
):
df = pd.concat([df0, df1])
df['ObjectType'].update(df['ObjectClass'])
print (df)
B C ObjectClass ObjectType
0 4 7 NaN 1.0
1 5 8 NaN 2.0
2 6 9 NaN 3.0
0 4 7 1.0 1.0
1 5 8 2.0 2.0
2 6 9 3.0 3.0
或者fillna
,但随后需要滴原列列:
df = pd.concat([df0, df1])
df["ObjectType"] = df['ObjectType'].fillna(df['ObjectClass'])
df = df.drop('ObjectClass', axis=1)
print (df)
B C ObjectType
0 4 7 1.0
1 5 8 2.0
2 6 9 3.0
0 4 7 1.0
1 5 8 2.0
2 6 9 3.0
df = pd.concat([df0, df1])
df["MyObjectClasses"] = df['ObjectType'].fillna(df['ObjectClass'])
df = df.drop(['ObjectType','ObjectClass'], axis=1)
print (df)
B C MyObjectClasses
0 4 7 1.0
1 5 8 2.0
2 6 9 3.0
0 4 7 1.0
1 5 8 2.0
2 6 9 3.0
EDIT1:
时序:
df0 = pd.DataFrame({'ObjectType':[1,2,3],
'B':[4,5,6],
'C':[7,8,9]})
#print (df0)
df1 = pd.DataFrame({'ObjectClass':[1,2,3],
'B':[4,5,6],
'C':[7,8,9]})
#print (df1)
df0 = pd.concat([df0]*1000).reset_index(drop=True)
df1 = pd.concat([df1]*1000).reset_index(drop=True)
inputs= ["ObjectType","ObjectClass"]
output= "MyObjectClasses"
#dict comprehension
d = {x:output for x in inputs}
In [241]: %timeit df_total = pd.concat([df0.rename(columns=d), df1.rename(columns=d)], ignore_index=True)
1000 loops, best of 3: 821 µs per loop
In [240]: %%timeit
...: df = pd.concat([df0, df1])
...: df['ObjectType'].update(df['ObjectClass'])
...: df = df.drop(['ObjectType','ObjectClass'], axis=1)
...:
100 loops, best of 3: 2.18 ms per loop
In [242]: %%timeit
...: df = pd.concat([df0, df1])
...: df['MyObjectClasses'] = df['ObjectType'].combine_first(df['ObjectClass'])
...: df = df.drop(['ObjectType','ObjectClass'], axis=1)
...:
100 loops, best of 3: 2.21 ms per loop
In [243]: %%timeit
...: df = pd.concat([df0, df1])
...: df['MyObjectClasses'] = df['ObjectType'].fillna(df['ObjectClass'])
...: df = df.drop(['ObjectType','ObjectClass'], axis=1)
...:
100 loops, best of 3: 2.28 ms per loop
是这可能工作合并由南的分成一个两列。但是,我有很多列,我只想重命名两列。当数据帧有两列时,您的解决方案是否可用? –
我认为这是通用的解决方案 - 在两个数据框中只需要相同的列名称。 – jezrael
Thanx为您提供帮助,但我选择了“combine_first”的答案,因为它更简单。 :) –