2017-10-17 55 views
0

你好我想合并两个数据框,我在Excel中加载。我将应该合并的列转换为“str”。 Suprsingly代码合并的第一行,但然后返回NaN值.... 我使用的代码是:合并返回除第一行外的NaN

ListA=pd.read_excel(inpath,sheetname="Tabelle2") 
ListA["Stücklistenkomponente"]=ListA["Material"].astype(np.str) 
ListB=pd.read_excel(inpath,sheetname="Tabelle1") 
ListB["Stücklistenkomponente"]=ListB["Material"].astype(np.str) 
print(ListA.dtypes) 
print(ListB.dtypes) 

Material对象

Material对象

的形状两个数据帧是:

ListA

Material 
R 22B 2.0 7.72 11.0 Lo 
X 127 1.5x4.64x4[G16.05.01] CL 
L 431 2x6,96x5.5 Y 
9999 
L 431 2x5,96x5.5 p 
F 631 2x6,96x5.5 a 
N 431 2x6,96x5.5 v 
J 431 2x6,96x5.5 
O 431 2x6,96x5.5 
VM 431 2x6,96x5.5 L 

数组listB

Material       InnerDiameter OuterDiameter Length 
    R 22B 2.0 7.72 11.0 Lo   2    6    8 
    X 127 1.5x4.64x4[G16.05.01] CL 2    7    12 
    L 431 2x6,96x5.5 Y    5    8    13 
    9999        0    0    0 
    L 431 2x5,96x5.5 p    6    9    15 
    F 631 2x6,96x5.5 a    8    5    26 
    N 431 2x6,96x5.5 v    9    1    3  
    J 431 2x6,96x5.5     12    6    89 
    O 431 2x6,96x5.5     5    4    12 
    VM 431 2x6,96x5.5 L    4    12    7 

它返回:

  Material  InnerDiameter OuterDiameter Lenth 
      R 22B 2.0 7.72 11.0 Lo 2     6  8 
        NaN    NaN    NaN NaN 
        NaN    NaN    NaN NaN 
        NaN    NaN    NaN NaN 
        NaN    NaN    NaN NaN 
        NaN    NaN    NaN NaN 
        NaN    NaN    NaN NaN 
        NaN    NaN    NaN NaN 
        NaN    NaN    NaN NaN 
        NaN    NaN    NaN NaN 
        NaN    NaN    NaN NaN 

那我做错了吗?我认为解决方案是将两列转换为dtype字符串,但这不起作用....

感谢任何帮助!

回答

0

我认为必须有一些不同的数据,也许拖曳witespaces,因为.astype(str)正确地将数据转换为string s。

如果数据是string S,dict S,set S,list当时的dtypeobject

typestringdict ...

您可以通过检查:

print(ListA["Stücklistenkomponente"].apply(type)) 

对于检查数据更好地帮助某个时候产生lists

print(ListA["Stücklistenkomponente"].tolist()) 
print(ListB["Stücklistenkomponente"].tolist()) 

编辑:

我测试数据和结果真的很有趣:

df1 = pd.read_excel('Mappe3.xlsx',sheetname="Tabelle2") 
df2 = pd.read_excel('Mappe3.xlsx',sheetname="Tabelle1") 

#default inner join - get duplicated rows, because duplicate values 
#on should be omit if only one same column for join 
df = pd.merge(df1, df2) 
print (df.head(10)) 
        Stücklistenkomponente Ritzel_Materialnummer \ 
0 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS   401.4425.13 
1 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS   401.4425.13 
2 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS   401.4425.13 
3 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS   401.4425.13 
4 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS   401.4425.13 
5 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS   401.4425.13 
6 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS   401.4425.13 
7 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS   401.4425.13 
8 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS   401.4425.13 
9 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS   401.4425.13 
... 
... 

#remove duplicates in both df 
df1 = df1.drop_duplicates('Stücklistenkomponente') 
df2 = df2.drop_duplicates('Stücklistenkomponente') 

#default inner join - only 5 same categories 
df = pd.merge(df1, df2) 
print (df) 
        Stücklistenkomponente Ritzel_Materialnummer \ 
0 RITZEL 22F 2.0 7.72 11.0 Z17 SCHWEISS   401.4425.13 
1 RITZEL 22F 3.0 7.72 11.0 Z17 SCHWEISS   401.4425.15 
2  RITZEL 22F 3.0 7.9 6.0 Z17 PRESS   401.4425.11 
3  RITZEL 22F 3.0 6.0 15.0 PRESS Z8   401.4487.01 
4  RITZEL 22F 4.0 7.9 6.0 Z17 PRESS   401.4425.14 

    Innendurchmesser Außendurchmesser Länge   Material1 Material2 \ 
0    2    7.72 11.0   X46Cr13   - 
1    3    7.72 11.0   X46Cr13   - 
2    4    7.90 6.0 42CrMo4 vergütet   - 
3    3    6.00 15.0 42CrMo4 vergütet   - 
4    2    7.90 6.0 42CrMo4 vergütet   - 

    Material3 
0   - 
1   - 
2   - 
3   - 
4   - 
+0

不幸的是,数据是相同的,也检查dtypes没有透露任何差异....不知道 – 2Obe

+0

此外,为什么它为第一行工作,但然后停止 – 2Obe

+0

数据是同列的明智吗?什么返回'print(ListA [“Stücklistenkomponente”] == ListB [“Stücklistenkomponente”])? – jezrael