2016-01-20 480 views
0

我想在IPython环境中加载数据集并使用它。如何解决使用pickle.load()函数时发生的错误?

在包含数据集的目录,我已经得到了这些文件:

  • batches.meta
  • data_batch_1
  • data_batch_2
  • data_batch_3
  • data_batch_4
  • data_batch_5
  • 自述文件
  • test_batch

我写了这个代码:

import os 
import pickle as pickle 
import numpy as np 
import matplotlib.pyplot as plt 

#Function Definition 
def load_CIFAR(ROOT): 
xs=[]; 
ys=[]; 
for b in range(6): 
    f = os.path.join(ROOT, "data_batch_%d"%(b+1)); 
    X, Y = load_CIFAR_batch(f); 
    xs.append(X); 
    ys.append(Y); 
Xtr = np.concatenate(xs); 
Ytr = np.concatenate(ys); 

del X, Y; 
Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, "test_batch")); 
return Xtr, Ytr, Xte, Yte 

#Function Definition 
def load_CIFAR_batch(filename): 
with open(filename, 'r') as f: 

    ****** Here is where error occurs 
    datadict = pickle.load(f); 
    ****** 
    X = datadict['data']; 
    Y = datadict['labels']; 
    X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float"); 
    Y = np.array(Y); 
    return X, Y; 

但是,当我用这个函数加载与下面的命令这个数据集,我碰到一个[需要字节状物体,不'str']错误。

#The directory of my dataset in my hard drive 
url = 'D:\\OTIWU\\data\\cifar10' 
Xtr, Ytr, Xte, Yte = load_CIFAR(url) 

上面是我用过的命令。

The whole error: 
--------------------------------------------------------------------------- 
TypeError Traceback (most recent call last) 
<ipython-input-14-f0576df4fbda> in <module>() 
----> 1 Xtr, Ytr, Xte, Yte = load_CIFAR(url) 

<ipython-input-10-fedf6bd7c144> in load_CIFAR(ROOT) 
     4  for b in range(1,6): 
     5   f=os.path.join(ROOT, "data_batch_%d" % (b,)); 
     ----> 6   X, Y=load_CIFAR_batch(f); 
     7   xs.append(X); 
     8   ys.append(Y); 

     <ipython-input-13-368cd3e9d8d2> in load_CIFAR_batch(filename) 
     1 def load_CIFAR_batch(filename): 
     2  with open(filename, 'r') as f: 
     ----> 3   datadict = pickle.load(f); 
     4 
     5   X = datadict['data']; 

     TypeError: a bytes-like object is required, not 'str' 

我该如何解决这样的问题?

+0

你在哪里'pickling'的数据?看来你需要使用'pickle.loads(...)'而不是'pickle.load(...)' – tglaria

+0

告诉我们如何创建pickle文件。 –

+0

@JohnGordon对不起,我必须编辑问题的上下文吗? –

回答

0

我找到了解决方案。这是一个与python 3.x相关的问题。当我用python 2.x运行它时,我可以读取数据集中的所有数据。我也不得不说,我已经改变了一点点的源代码。我的意思是我从cPickle库中使用,而不是Pickle和除此问题之外的所有源代码都与以前相同。

0

你需要打开你的文件中二进制模式

with open(filename, 'rb') as f: