0
我想实现一个简单的RNN来预测整数序列中的下一个整数。所以,我有一个数据集是如下:越来越ValueError:“只能使用多元索引的元组索引”
Id Sequence
1 1,0,0,2,24,552,21280,103760,70299264,5792853248,587159944704
2 1,1,5,11,35,93,269,747,2115,5933,16717,47003,132291,372157,1047181,2946251,8289731,23323853,65624397,184640891,519507267,1461688413,4112616845,11571284395,32557042499,91602704493,257733967693
4 0,1,101,2,15,102,73,3,40,16,47,103,51,74,116,4,57,41,125,17,12,48,9,104,30,52,141,75,107,117,69,5,148,58,88,42,33,126,152,18,160,13,38,49,55,10,28,105,146,31,158
5 1,4,14,23,42,33,35,34,63,66,87,116,84,101,126,164,128,102,135,143,149,155,203,224,186,204,210,237,261,218,219,286,257,266,361,355,336,302,374,339,371,398,340,409,348,388,494,436,407,406
6 1,1,2,5,4,2,6,13,11,4,10,10,12,6,8,29,16,11,18,20,12,10,22,26,29,12,38,30,28,8,30,61,20,16,24,55,36,18,24,52,40,12,42,50,44,22,46,58,55,29,32,60,52,38,40,78,36,28,58,40,60,30,66,125,48,20,66,80,44,24
9 0,31,59,90,120,151,181,212,243,273,304,334,365,396,424,455,485,516,546,577,608,638,669,699,730,761,789,820,850,881,911,942,973,1003,1034,1064,1095,1126,1155,1186,1216,1247,1277,1308,1339,1369,1400,1430
10 1,1,2,5,13,36,111,347,1134,3832,13126,46281,165283,598401,2202404,8168642,30653724,116082962,442503542,1701654889,6580937039,25603715395,100223117080,394001755683,1556876401398,6178202068457,24608353860698,98421159688268,394901524823138,1589722790850089
12 0,0,0,0,112,40286,5485032,534844548,45066853496,3538771308282,267882021563464,19861835713621616,1453175611052688600,105278656040052332838,7564280930105061931496
到目前为止我的代码是:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import SimpleRNN
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from keras.preprocessing.sequence import pad_sequences
def stoarray(data = [], sep = ','):
return data.map(lambda x: np.array(x.split(sep), dtype=float))
def create_dataset(dataset, window_size=1):
dataX, dataY = [], []
for i in range(len(dataset)-window_size-1):
a = dataset[i:(i+window_size), 0]
dataX.append(a)
dataY.append(dataset[i + window_size, 0]) #gives the ValueError : Can only tuple index with multi index
return np.array(dataX), np.array(dataY)
# fix random seed for reproducibility
np.random.seed(7)
# loading data
colna = ['id', 'seq']
train_data = pd.read_csv('G:/Python/integer_sequencing/testfile.csv', header=1)
train_data.columns = colna
dataset = train_data['seq']
#print(dataset)
window_size = 1
X_train, Y_train = create_dataset(dataset, window_size)
print(X_train.head(5))
print(Y_train.head(5))
我试图用X_train每个序列分割作为输入,它由一套完整的除上一学期和Y_train被视为输出将只包含最后一个数字。 但是当我运行代码时,我得到了ValueError:只能使用带有MultiIndex的元组索引。 任何人都可以解释它对我的代码意味着什么,我必须做些什么来解决它。
PS - 我新的堆栈溢出和深度学习,所以我会很感激,如果你可以建议和帮助我的问题的格式。
始终把完整的错误消息(回溯)的问题 - 还有其他有用inforamtion,即。它显示哪条线出现问题。 – furas
在'dataset [i:(i + window_size),0]'你使用元组'i:(i + window_size),0'来获取数据,但是它不允许这样做。您可以只使用单个整数,如'dataset [0]'或切片'dataset [i:(i + window_size)]'。在你的代码中'dataset'意思是'train_data ['seq']',它是DataFrame的单列 - 它是'Series',而不是'DataFrame'只有一列。使用'print(type(dataset))'来查看它。 – furas
我在问题中添加了错误消息。希望它有帮助。 –