2016-11-27 79 views
2

我跑在jupyter笔记本蟒蛇下面的代码:加载数据集

# Run some setup code for this notebook. 

import random 
import numpy as np 
from cs231n.data_utils import load_CIFAR10 
import matplotlib.pyplot as plt 

# This is a bit of magic to make matplotlib figures appear inline in the notebook 
# rather than in a new window. 
%matplotlib inline 
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots 
plt.rcParams['image.interpolation'] = 'nearest' 
plt.rcParams['image.cmap'] = 'gray' 

# Some more magic so that the notebook will reload external python modules; 
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython 
%load_ext autoreload 
%autoreload 2 

,然后下面的说明:

# Load the raw CIFAR-10 data. 
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py' 
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir) 

# As a sanity check, we print out the size of the training and test data. 
print ('Training data shape: ', X_train.shape) 
print ('Training labels shape: ', y_train.shape) 
print ('Test data shape: ', X_test.shape) 
print ('Test labels shape: ', y_test.shape) 

通过运行第2部分,我是个提示以下错误:

--------------------------------------------------------------------------- 
UnicodeDecodeError      Traceback (most recent call last) 
<ipython-input-5-9506c06e646a> in <module>() 
     1 # Load the raw CIFAR-10 data. 
     2 cifar10_dir = 'cs231n/datasets/cifar-10-batches-py' 
----> 3 X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir) 
     4 
     5 # As a sanity check, we print out the size of the training and test data. 

C:\Users\lenovo\assignment1\cs231n\data_utils.py in load_CIFAR10(ROOT) 
    20 for b in range(1,6): 
    21  f = os.path.join(ROOT, 'data_batch_%d' % (b,)) 
---> 22  X, Y = load_CIFAR_batch(f) 
    23  xs.append(X) 
    24  ys.append(Y) 

C:\Users\lenovo\assignment1\cs231n\data_utils.py in load_CIFAR_batch(filename) 
     7 """ load single batch of cifar """ 
     8 with open(filename, 'rb') as f: 
----> 9  datadict = pickle.load(f) 
    10  X = datadict['data'] 
    11  Y = datadict['labels'] 

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128) 

如何解决此错误?我正在使用Annaconda3来运行此代码。看起来上面的代码已经在Annaonda2版本中写过了。任何解决这些错误的消息?

只是为了更多的细节:

我试图解决从链路分配:http://cs231n.github.io/assignments2016/assignment1/

编辑:

添加含load_CIFAR

import _pickle as pickle 
import numpy as np 
import os 
from scipy.misc import imread 

def load_CIFAR_batch(filename): 
    """ load single batch of cifar """ 
    with open(filename, 'rb') as f: 
    datadict = pickle.load(f) 
    X = datadict['data'] 
    Y = datadict['labels'] 
    X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float") 
    Y = np.array(Y) 
    return X, Y 

def load_CIFAR10(ROOT): 
    """ load all of cifar """ 
    xs = [] 
    ys = [] 
    for b in range(1,6): 
    f = os.path.join(ROOT, 'data_batch_%d' % (b,)) 
    X, Y = load_CIFAR_batch(f) 
    xs.append(X) 
    ys.append(Y)  
    Xtr = np.concatenate(xs) 
    Ytr = np.concatenate(ys) 
    del X, Y 
    Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch')) 
    return Xtr, Ytr, Xte, Yte 

回答

2
的定义data_utils.py

你正在加载的泡菜文件最有可能是用python 2生成的。

由于pickle在Python2和Python3中的工作方式存在根本差异,因此您可以尝试使用latin-1编码加载文件,并假设将0-255直接映射到字符。

此方法需要进行一些健全性检查,因为不能保证生成一致的数据。

+0

二进制模式不接受编码参数 –

+0

我的坏,我的意思是将它添加到酸菜加载,请参阅我的编辑。 –

+0

现在错误已经改变为“UnicodeDecodeError:'utf-8'编解码器无法解码位置6中的字节0x8b:无效启动”而不是ascii编解码器 –