我不知道发生了什么,标题只是一阶近似。我试图把两个数据帧:熊猫加入:无法识别加入列
>>> df_sum.head()
TUCASEID t070101 t070102 t070103 t070104 t070105 t070199 \
0 20030100013280 0 0 0 0 0 0
1 20030100013344 0 0 0 0 0 0
2 20030100013352 60 0 0 0 0 0
3 20030100013848 0 0 0 0 0 0
4 20030100014165 0 0 0 0 0 0
t070201 t070299 shopping year
0 0 0 0 2003
1 0 0 0 2003
2 0 0 60 2003
3 0 0 0 2003
4 0 0 0 2003
>>> emp.head()
TUCASEID status
0 20030100013280 emp
1 20030100013344 emp
2 20030100013352 emp
4 20030100014165 emp
5 20030100014169 emp
这是该数据帧,我想加入他们在公共列TUCASEID
,其中有交叉:
>>> np.intersect1d(emp.TUCASEID, df_sum.TUCASEID)
array([20030100013280, 20030100013344, 20030100013352, ..., 20131212132462,
20131212132469, 20131212132475])
现在...
>>> df_sum.join(emp, on='TUCASEID', how='inner')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 3829, in join
rsuffix=rsuffix, sort=sort)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 3843, in _join_compat
suffixes=(lsuffix, rsuffix), sort=sort)
File "/usr/local/lib/python2.7/site-packages/pandas/tools/merge.py", line 39, in merge
return op.get_result()
File "/usr/local/lib/python2.7/site-packages/pandas/tools/merge.py", line 193, in get_result
rdata.items, rsuf)
File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3873, in items_overlap_with_suffix
to_rename)
ValueError: columns overlap but no suffix specified: Index([u'TUCASEID'], dtype='object')
嗯,这很奇怪,出现在这两个数据帧的唯一列是一个参加过,但是那好,我们同意[1]:
>>> df_sum.join(emp, on='TUCASEID', how='inner', rsuffix='r')
Empty DataFrame
Columns: [TUCASEID, t070101, t070102, t070103, t070104, t070105, t070199, t070201, t070299, shopping, year, TUCASEIDr, status]
Index: []
尽管存在巨大的交叉点。这里发生了什么?
>>> pd.__version__
'0.15.0'
[1]:我实际上执行整数为D型接合柱的,因为它表示“对象”在那里,并没有区别:
>>> emp.dtypes
TUCASEID int64
status object
dtype: object
>>> df_sum.dtypes
TUCASEID int64
(...)
shopping int64
year int64
dtype: object
您的索引值不匹配,为什么不干脆 此外,所谓的这种方式,当合并为空合并它们'df_sum.merge(emp,on ='TUCASEID',how ='outer')'或者你只是想为每个'TUCASEID'行添加'status'列感兴趣?在这种情况下做'df_sum ['status'] = df ['sum ['TUCASEID']。map(emp.set_index('TUCASEID')' – EdChum 2015-01-31 22:24:13
@EdChum好吧,我想看看替代方案。索引值不匹配?我已经指定了替代'on ='列。 – FooBar 2015-01-31 22:25:39
不知道'join'加在索引上,奇怪的是我可以重新创建的行为,但是我建议应该使用的其他方法 – EdChum 2015-01-31 22:27:04