2013-03-12 117 views
1

海,熊猫加入/合并'只有有效索引唯一有价值的索引'

我得到一个非常强大的联接操作错误。我试图合并(left_index,right_index)以及相同的结果。

索引是相同的(按设计),通过两个索引上的index.is_unique(TRUE)和index.get_duplicates()(EMPTY)进行检查。

Basic版本:

df1.join(series) 
merge(df1, series_as_df, 

print tempres.index 

[2013年1月14日17时04分45秒,......,2013年1月14日16时53分05秒] 长度:89,频率:无,时区:无


奇怪的是打印值: 打印tempres.index.val ues [1970-01-16 121:04:45 1970-01-16 121:04:35 1970-01-16 121:04:25 1970-01-16 121:04:15 1970-01-16 121 :04:05 1970-01-16 121:03:55 1970-01-16 121:03:45 1970-01-16 121:03:35 1970-01-16 121:03:25 1970-01- 16 121:03:15 1970-01-16 121:03:05 1970-01-16 121:02:55 1970-01-16 121:02:45 1970-01-16 121:02:35 1970-01 -16 121:02:25 1970-01-16 121:02:15 1970-01-16 121:02:05 1970-01-16 121:01:55 1970-01-16 121:01:45 1970 -01-16 121:01:35 1970年1月16日121:01:25 ...]

如果需要我可以添加酸洗系列和df ...

采用最新版本的熊猫0.10.x

感谢,

吕克

我的代码(切从更大的代码)

XYTparams (existing dataframe) 
prep_functions[funcname] = [list of values, same length as XYTparams] 

iSeries = Series(prep_functions[funcname], index = XYTparams.index, name = funcname) 
XYTparams = XYTparams.join(iSeries) 

审查我的问题:

我使用合并和在基本DataFrame上连续连接。在尝试下一次合并/加入时,我开始出现错误。我无法在一个简单的测试中重现这一点,但我在问题开始之前保存了数据框。

我找不到问题所在。

base_df = load('SPOparams.pic') 
lookup_df = load('lookup.pic') 

print base_df 
print lookup_df 

print base_df.count() 

print base_df['VKCSKEY1'] 
print lookup_df['traf_key'] 

# reset index does not change a thing 
base_df = base_df.reset_index(drop=True) 

print base_df.index 
print base_df.index.get_duplicates() 
print lookup_df.index 
print lookup_df.index.get_duplicates() 


# checking value matches 
for k in lookup_df['traf_key']: 
    print k, k in base_df['VKCSKEY1'].values 

# why does this merge is unsuccesfull ??? 
# in any combination of the parameters 
df_result =merge(base_df, lookup_df, 
      how='left', 
      #how = 'outer', 
      left_on ='VKCSKEY1', 
      right_on ='traf_key', 
      #left_index=True, 
      #right_index = True, 
      #sort=True, 
      #suffixes=('', '.m'), copy=True 
      ) 
print df_result 

输出:

1.6.1 
0.10.1 
<class 'pandas.core.frame.DataFrame'> 
Int64Index: 89 entries, 0 to 88 
Data columns: 
T      89 non-null values 
X      89 non-null values 
Y      89 non-null values 
precip_quantity_1hour 89 non-null values 
pressure     89 non-null values 
rel_humidity    89 non-null values 
temp      89 non-null values 
temp_max     0 non-null values 
temp_min     0 non-null values 
wind_direction   89 non-null values 
wind_speed    89 non-null values 
BC_TRAF     89 non-null values 
closest     89 non-null values 
closest.m    89 non-null values 
AGGP.P50_ID    89 non-null values 
AGGP.FUNC_CLASS   89 non-null values 
AGGP.SPEED_CAT   89 non-null values 
LINK_ID     89 non-null values 
FUNC_CLASS    89 non-null values 
SPEED_CAT    89 non-null values 
AR_AUTO     89 non-null values 
AR_BUS     89 non-null values 
AR_TAXIS     89 non-null values 
AR_CARPOOL    89 non-null values 
AR_PEDEST    89 non-null values 
AR_TRUCKS    89 non-null values 
STCA20_PCT    89 non-null values 
VKC_LINKNR    89 non-null values 
TRVIC150R1    89 non-null values 
closest.m    89 non-null values 
closest.m.m    89 non-null values 
VKCP.LINK_ID    89 non-null values 
VKCP.FUNC_CLASS   89 non-null values 
VKCP.SPEED    89 non-null values 
VKCP.LINKNR    89 non-null values 
VKCP.TWIN_ID    89 non-null values 
VKCSKEY1     89 non-null values 
dtypes: datetime64[ns](1), float64(13), int64(9), object(14) 
<class 'pandas.core.frame.DataFrame'> 
Index: 30 entries, (60744, 0) to (58314, 0) 
Data columns: 
traf_key  30 non-null values 
weekday_nr 30 non-null values 
linknr  30 non-null values 
weekday  30 non-null values 
vr0   30 non-null values 
vr1   30 non-null values 
vr2   30 non-null values 
vr3   30 non-null values 
vr4   30 non-null values 
vr5   30 non-null values 
vr6   30 non-null values 
vr7   30 non-null values 
vr8   30 non-null values 
vr9   30 non-null values 
vr10   30 non-null values 
vr11   30 non-null values 
vr12   30 non-null values 
vr13   30 non-null values 
vr14   30 non-null values 
vr15   30 non-null values 
vr16   30 non-null values 
vr17   30 non-null values 
vr18   30 non-null values 
vr19   30 non-null values 
vr20   30 non-null values 
vr21   30 non-null values 
vr22   30 non-null values 
vr23   30 non-null values 
au0   30 non-null values 
au1   30 non-null values 
au2   30 non-null values 
au3   30 non-null values 
au4   30 non-null values 
au5   30 non-null values 
au6   30 non-null values 
au7   30 non-null values 
au8   30 non-null values 
au9   30 non-null values 
au10   30 non-null values 
au11   30 non-null values 
au12   30 non-null values 
au13   30 non-null values 
au14   30 non-null values 
au15   30 non-null values 
au16   30 non-null values 
au17   30 non-null values 
au18   30 non-null values 
au19   30 non-null values 
au20   30 non-null values 
au21   30 non-null values 
au22   30 non-null values 
au23   30 non-null values 
sn0   30 non-null values 
sn1   30 non-null values 
sn2   30 non-null values 
sn3   30 non-null values 
sn4   30 non-null values 
sn5   30 non-null values 
sn6   30 non-null values 
sn7   30 non-null values 
sn8   30 non-null values 
sn9   30 non-null values 
sn10   30 non-null values 
sn11   30 non-null values 
sn12   30 non-null values 
sn13   30 non-null values 
sn14   30 non-null values 
sn15   30 non-null values 
sn16   30 non-null values 
sn17   30 non-null values 
sn18   30 non-null values 
sn19   30 non-null values 
sn20   30 non-null values 
sn21   30 non-null values 
sn22   30 non-null values 
sn23   30 non-null values 
dtypes: float64(24), int64(50), object(2) 
T      89 
X      89 
Y      89 
precip_quantity_1hour 89 
pressure     89 
rel_humidity    89 
temp      89 
temp_max     0 
temp_min     0 
wind_direction   89 
wind_speed    89 
BC_TRAF     89 
closest     89 
closest.m    89 
AGGP.P50_ID    89 
AGGP.FUNC_CLASS   89 
AGGP.SPEED_CAT   89 
LINK_ID     89 
FUNC_CLASS    89 
SPEED_CAT    89 
AR_AUTO     89 
AR_BUS     89 
AR_TAXIS     89 
AR_CARPOOL    89 
AR_PEDEST    89 
AR_TRUCKS    89 
STCA20_PCT    89 
VKC_LINKNR    89 
TRVIC150R1    89 
closest.m    89 
closest.m.m    89 
VKCP.LINK_ID    89 
VKCP.FUNC_CLASS   89 
VKCP.SPEED    89 
VKCP.LINKNR    89 
VKCP.TWIN_ID    89 
VKCSKEY1     89 
0  (60744, 0) 
1  (60744, 0) 
2  (60744, 0) 
3  (60750, 0) 
4  (60768, 0) 
5  (60768, 0) 
6  (60758, 0) 
7  (60758, 0) 
8  (69223, 0) 
9  (69223, 0) 
10 (69223, 0) 
11 (64265, 0) 
12 (64265, 0) 
13 (64265, 0) 
14 (64265, 0) 
15 (64265, 0) 
16 (64265, 0) 
17 (64265, 0) 
18 (64265, 0) 
19 (64265, 0) 
20 (64216, 0) 
21 (64216, 0) 
22 (64216, 0) 
23 (64216, 0) 
24 (64216, 0) 
25 (64216, 0) 
26 (64216, 0) 
27 (64216, 0) 
28 (64216, 0) 
29 (57085, 0) 
30 (57085, 0) 
31 (57085, 0) 
32 (57085, 0) 
33 (57085, 0) 
34 (57085, 0) 
35 (57014, 0) 
36 (57033, 0) 
37 (57033, 0) 
38 (64065, 0) 
39 (64065, 0) 
40 (64065, 0) 
41 (64065, 0) 
42 (64065, 0) 
43 (57070, 0) 
44 (64062, 0) 
45 (64062, 0) 
46 (64062, 0) 
47 (64062, 0) 
48 (57070, 0) 
49 (64061, 0) 
50 (64061, 0) 
51 (64061, 0) 
52 (64061, 0) 
53 (59849, 0) 
54 (59415, 0) 
55 (58487, 0) 
56 (58054, 0) 
57 (58054, 0) 
58 (58054, 0) 
59 (52551, 0) 
60 (58054, 0) 
61 (58054, 0) 
62 (58054, 0) 
63 (58054, 0) 
64 (52551, 0) 
65 (58054, 0) 
66 (58488, 0) 
67 (58488, 0) 
68 (58028, 0) 
69 (58464, 0) 
70 (58028, 0) 
71 (57989, 0) 
72 (58595, 0) 
73 (58027, 0) 
74 (57989, 0) 
75 (58595, 0) 
76 (58595, 0) 
77 (58019, 0) 
78 (58595, 0) 
79 (58595, 0) 
80 (58019, 0) 
81 (58595, 0) 
82 (58595, 0) 
83 (66715, 0) 
84 (58595, 0) 
85 (59295, 0) 
86 (67614, 0) 
87 (58314, 0) 
88 (58314, 0) 
Name: VKCSKEY1, Length: 89 
VKCSKEY1 
(60744, 0) (60744, 0) 
(60750, 0) (60750, 0) 
(60768, 0) (60768, 0) 
(60758, 0) (60758, 0) 
(69223, 0) (69223, 0) 
(64265, 0) (64265, 0) 
(64216, 0) (64216, 0) 
(57085, 0) (57085, 0) 
(57014, 0) (57014, 0) 
(57033, 0) (57033, 0) 
(64065, 0) (64065, 0) 
(57070, 0) (57070, 0) 
(64062, 0) (64062, 0) 
(64061, 0) (64061, 0) 
(59849, 0) (59849, 0) 
(59415, 0) (59415, 0) 
(58487, 0) (58487, 0) 
(58054, 0) (58054, 0) 
(52551, 0) (52551, 0) 
(58488, 0) (58488, 0) 
(58028, 0) (58028, 0) 
(58464, 0) (58464, 0) 
(57989, 0) (57989, 0) 
(58595, 0) (58595, 0) 
(58027, 0) (58027, 0) 
(58019, 0) (58019, 0) 
(66715, 0) (66715, 0) 
(59295, 0) (59295, 0) 
(67614, 0) (67614, 0) 
(58314, 0) (58314, 0) 
Name: traf_key 
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88], dtype=int64) 
[] 
Index([(60744, 0), (60750, 0), (60768, 0), (60758, 0), (69223, 0), (64265, 0), (64216, 0), (57085, 0), (57014, 0), (57033, 0), (64065, 0), (57070, 0), (64062, 0), (64061, 0), (59849, 0), (59415, 0), (58487, 0), (58054, 0), (52551, 0), (58488, 0), (58028, 0), (58464, 0), (57989, 0), (58595, 0), (58027, 0), (58019, 0), (66715, 0), (59295, 0), (67614, 0), (58314, 0)], dtype=object) 
[] 
(60744, 0) True 
(60750, 0) True 
(60768, 0) True 
(60758, 0) True 
(69223, 0) True 
(64265, 0) True 
(64216, 0) True 
(57085, 0) True 
(57014, 0) True 
(57033, 0) True 
(64065, 0) True 
(57070, 0) True 
(64062, 0) True 
(64061, 0) True 
(59849, 0) True 
(59415, 0) True 
(58487, 0) True 
(58054, 0) True 
(52551, 0) True 
(58488, 0) True 
(58028, 0) True 
(58464, 0) True 
(57989, 0) True 
(58595, 0) True 
(58027, 0) True 
(58019, 0) True 
(66715, 0) True 
(59295, 0) True 
(67614, 0) True 
(58314, 0) True 
Traceback (most recent call last): 
    File "L:\temp\pandas_join_bug.py", line 43, in <module> 
    right_on ='traf_key', 
    File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 36, in merge 
    return op.get_result() 
    File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 185, in get_result 
    ldata, rdata = self._get_merge_data() 
    File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 277, in _get_merge_data 
    copydata=False) 
    File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 1194, in _maybe_rename_join 
    to_rename = self.items.intersection(other.items) 
    File "C:\Python27\lib\site-packages\pandas\core\index.py", line 666, in intersection 
    indexer = self.get_indexer(other.values) 
    File "C:\Python27\lib\site-packages\pandas\core\index.py", line 812, in get_indexer 
    raise Exception('Reindexing only valid with uniquely valued Index ' 
Exception: Reindexing only valid with uniquely valued Index objects 

一旦出现错误,我不能得到任何合并或连接语句是成功的。起初我没有看到错误与重复的合并/连接操作相关联。现在最新设置的任何合并/加入都可以工作。一旦我需要另一个合并/加入,我会得到同样的错误。现在挣扎数日...

帮助!!!

吕克

+0

安置自己的数据和代码,请。 – HYRY 2013-03-12 12:59:13

+0

发生奇怪的事情。如果系列数据是数字的,代码的作品,如果它是一个元组或字符串,它会失败... – user1708646 2013-03-13 19:32:51

+0

fyi,你的打印只是显示numpy 1.6.2如何表示日期,你在做什么? – Jeff 2013-03-14 14:48:27

回答

9

重复的列名会导致此错误,请尝试消除重复的列名

+0

从上面的评论中无耻地被盗,以便在新手寻找它的地方得到答案。我有这个确切的问题,几乎错过了答案 – 2013-08-29 20:12:17

相关问题