2017-10-05 161 views
0

我正在网站浏览几个网站的一些数据,我正在使用熊猫来修改它。Python中的熊猫错误:列的长度必须与密钥长度相同

在第一个数据是工作顺利,但后来我收到此错误信息:`

Traceback(most recent call last): File "data.py", line 394 in <module> df2[['STATUS_ID_1','STATUS_ID_2']] = df2['STATUS'].str.split(n=1, expand=True) File "/home/web/.local/lib/python2.7/site-packages/pandas/core/frame.py, line 2326, in __setitem__ self._setitem_array(key,value) File "/home/web/.local/lib/python2.7/site-packages/pandas/core/frame.py, line 2350, in _setitem_array raise ValueError("Columns must be same lenght as key') ValueError: Columns must be same lenght as key 

我的部分代码是在这里:

df2 = pd.DataFrame(datatable,columns = cols) 
df2['FLIGHT_ID_1'] = df2['FLIGHT'].str[:3] 
df2['FLIGHT_ID_2'] = df2['FLIGHT'].str[3:].str.zfill(4) 
df2[['STATUS_ID_1','STATUS_ID_2']] = df2['STATUS'].str.split(n=1, expand=True) 

编辑-jezrael:我用了你代码,并从此做出了一个打印: 我希望通过这个,我们可以找到问题所在。因为看起来它是随机的,当脚本遇到这个分割问题时。

    0   1 
2  Landed 8:33 AM 
3  Landed 9:37 AM 
4  Landed 9:10 AM 
5  Landed 9:57 AM 
6  Landed 9:36 AM 
8  Landed 8:51 AM 
9  Landed 9:18 AM 
11  Landed 8:53 AM 
12  Landed 7:59 AM 
13  Landed 7:52 AM 
14  Landed 8:56 AM 
15  Landed 8:09 AM 
18  Landed 8:42 AM 
19  Landed 9:39 AM 
20  Landed 9:45 AM 
21  Landed 7:44 AM 
23  Landed 8:36 AM 
27  Landed 9:53 AM 
29  Landed 9:26 AM 
30  Landed 8:23 AM 
35  Landed 9:59 AM 
36  Landed 8:38 AM 
37  Landed 9:38 AM 
38  Landed 9:37 AM 
40  Landed 9:27 AM 
43  Landed 9:14 AM 
44  Landed 9:22 AM 
45  Landed 8:18 AM 
46  Landed 10:01 AM 
47  Landed 10:21 AM 
..   ...  ... 
316 Delayed 5:00 PM 
317 Delayed 4:34 PM 
319 Estimated 2:58 PM 
320 Estimated 3:02 PM 
321 Delayed 4:47 PM 
323 Estimated 3:08 PM 
325 Delayed 3:52 PM 
326 Estimated 3:09 PM 
327 Estimated 2:37 PM 
328 Estimated 3:17 PM 
329 Estimated 3:20 PM 
330 Estimated 2:39 PM 
331 Delayed 4:04 PM 
332 Delayed 4:36 PM 
337 Estimated 3:47 PM 
339 Estimated 3:37 PM 
341 Delayed 4:32 PM 
345 Estimated 3:34 PM 
349 Estimated 3:24 PM 
356 Delayed 4:56 PM 
358 Estimated 3:45 PM 
367 Estimated 4:09 PM 
370 Estimated 4:04 PM 
371 Estimated 4:11 PM 
373 Delayed 5:21 PM 
382 Estimated 3:56 PM 
384 Delayed 4:28 PM 
389 Delayed 4:41 PM 
393 Estimated 4:02 PM 
397 Delayed 5:23 PM 

[240 rows x 2 columns] 
+0

您可以添加一些数据样本吗? – jezrael

+0

(https://stackoverflow.com/questions/46522269/how-can-i-split-a-column-into-2-in-the-correct-way) (https://stackoverflow.com/questions/ 46524461/how-can-i-split-a-column-into-2-in-the-correct-in-python) – Harley

+0

嗯,真的很有趣。你可以检查'df3 = df2 ['STATUS']。str.split(n = 1,expand = True)'然后'print(df3 [df3 [df3.columns [-1]]。notnull()]) ?你可以添加输出到问题吗? – jezrael

回答

1

你需要一点点修改的解决方案,因为有时它返回2,有时只有一列:

df2 = pd.DataFrame({'STATUS':['Estimated 3:17 PM','Delayed 3:00 PM']}) 


df3 = df2['STATUS'].str.split(n=1, expand=True) 
df3.columns = ['STATUS_ID{}'.format(x+1) for x in df3.columns] 
print (df3) 
    STATUS_ID1 STATUS_ID2 
0 Estimated 3:17 PM 
1 Delayed 3:00 PM 

df2 = df2.join(df3) 
print (df2) 
       STATUS STATUS_ID1 STATUS_ID2 
0 Estimated 3:17 PM Estimated 3:17 PM 
1 Delayed 3:00 PM Delayed 3:00 PM 

另一种可能的数据 - 所有数据都没有空格和解决工作压力太大:

df2 = pd.DataFrame({'STATUS':['Canceled','Canceled']}) 

和解答回复:

print (df2) 
    STATUS STATUS_ID1 
0 Canceled Canceled 
1 Canceled Canceled 

全部在一起:

df3 = df2['STATUS'].str.split(n=1, expand=True) 
df3.columns = ['STATUS_ID{}'.format(x+1) for x in df3.columns] 
df2 = df2.join(df3) 
+0

我必须在我的代码中正确插入什么? 这一个:df2 [['STATUS_ID_1','STATUS_ID_2']] = df2 ['STATUS']。str.split(n = 1,expand = True) df2 = pd.DataFrame({'STATUS'})df3 = df2 ['STATUS']。str.split(n = 1,expand = True) df3.columns = ['STATUS_ID {'。'format(x + 1)for x in df3.columns]? – Harley

+0

我的代码改为'df2 [['STATUS_ID_1','STATUS_ID_2']] = df2 ['STATUS']。str.split(n = 1,expand = True)' – jezrael

+0

好吧,我必须写而不是这[[取消],[取消]]?只删除它,并使用你的前三行? – Harley

相关问题