2017-11-25 198 views
0

我想实现一个简单的RNN来预测整数序列中的下一个整数。所以,我有一个数据集是如下:越来越ValueError:“只能使用多元索引的元组索引”

Id Sequence 
1 1,0,0,2,24,552,21280,103760,70299264,5792853248,587159944704 
2 1,1,5,11,35,93,269,747,2115,5933,16717,47003,132291,372157,1047181,2946251,8289731,23323853,65624397,184640891,519507267,1461688413,4112616845,11571284395,32557042499,91602704493,257733967693 
4 0,1,101,2,15,102,73,3,40,16,47,103,51,74,116,4,57,41,125,17,12,48,9,104,30,52,141,75,107,117,69,5,148,58,88,42,33,126,152,18,160,13,38,49,55,10,28,105,146,31,158 
5 1,4,14,23,42,33,35,34,63,66,87,116,84,101,126,164,128,102,135,143,149,155,203,224,186,204,210,237,261,218,219,286,257,266,361,355,336,302,374,339,371,398,340,409,348,388,494,436,407,406 
6 1,1,2,5,4,2,6,13,11,4,10,10,12,6,8,29,16,11,18,20,12,10,22,26,29,12,38,30,28,8,30,61,20,16,24,55,36,18,24,52,40,12,42,50,44,22,46,58,55,29,32,60,52,38,40,78,36,28,58,40,60,30,66,125,48,20,66,80,44,24 
9 0,31,59,90,120,151,181,212,243,273,304,334,365,396,424,455,485,516,546,577,608,638,669,699,730,761,789,820,850,881,911,942,973,1003,1034,1064,1095,1126,1155,1186,1216,1247,1277,1308,1339,1369,1400,1430 
10 1,1,2,5,13,36,111,347,1134,3832,13126,46281,165283,598401,2202404,8168642,30653724,116082962,442503542,1701654889,6580937039,25603715395,100223117080,394001755683,1556876401398,6178202068457,24608353860698,98421159688268,394901524823138,1589722790850089 
12 0,0,0,0,112,40286,5485032,534844548,45066853496,3538771308282,267882021563464,19861835713621616,1453175611052688600,105278656040052332838,7564280930105061931496 

到目前为止我的代码是:

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 
import math 
from keras.models import Sequential 
from keras.layers import Dense 
from keras.layers import SimpleRNN 
from sklearn.preprocessing import MinMaxScaler 
from sklearn.metrics import mean_squared_error 
from keras.preprocessing.sequence import pad_sequences 

def stoarray(data = [], sep = ','): 
    return data.map(lambda x: np.array(x.split(sep), dtype=float)) 

def create_dataset(dataset, window_size=1): 
    dataX, dataY = [], [] 
    for i in range(len(dataset)-window_size-1): 
     a = dataset[i:(i+window_size), 0] 
     dataX.append(a) 
     dataY.append(dataset[i + window_size, 0]) #gives the ValueError : Can only tuple index with multi index 
    return np.array(dataX), np.array(dataY) 

# fix random seed for reproducibility 
np.random.seed(7) 

# loading data 
colna = ['id', 'seq'] 
train_data = pd.read_csv('G:/Python/integer_sequencing/testfile.csv', header=1) 
train_data.columns = colna 
dataset = train_data['seq'] 
#print(dataset) 
window_size = 1 
X_train, Y_train = create_dataset(dataset, window_size) 

print(X_train.head(5)) 
print(Y_train.head(5)) 

我试图用X_train每个序列分割作为输入,它由一套完整的除上一学期和Y_train被视为输出将只包含最后一个数字。 但是当我运行代码时,我得到了ValueError:只能使用带有MultiIndex的元组索引。 任何人都可以解释它对我的代码意味着什么,我必须做些什么来解决它。

回溯电话: enter image description here

PS - 我新的堆栈溢出和深度学习,所以我会很感激,如果你可以建议和帮助我的问题的格式。

+0

始终把完整的错误消息(回溯)的问题 - 还有其他有用inforamtion,即。它显示哪条线出现问题。 – furas

+0

在'dataset [i:(i + window_size),0]'你使用元组'i:(i + window_size),0'来获取数据,但是它不允许这样做。您可以只使用单个整数,如'dataset [0]'或切片'dataset [i:(i + window_size)]'。在你的代码中'dataset'意思是'train_data ['seq']',它是DataFrame的单列 - 它是'Series',而不是'DataFrame'只有一列。使用'print(type(dataset))'来查看它。 – furas

+0

我在问题中添加了错误消息。希望它有帮助。 –

回答

1

您有i:(i+window_size), 0的问题dataset[i:(i+window_size), 0]

在您的代码dataset意味着train_data['seq']这是单柱 - 单维Series - 但是你用i:(i+window_size), 0像二维DataFrame

只能使用单一的整数像dataset[0]或切片dataset[i:(i+window_size)]