我不知道你是否能帮助我在这里,但我有一个问题,我不明白。我有一个大的(对我来说)大约450,000个条目的数据集。每个条目是约700〜整数的列表,格式如下:TFLearn - 大数据集去NaN损失
[217088.0, 212992.0, 696.0, 191891.0, 524.0, 320.0, 0.0, 496.0, 0, 0, 364.0, 20.0, 0, 1.0, 0, 0.0, 0, 4.0, 22.0, 0, 672.0, 46.0, 16.0, 0.0, 0.0, 106496.0, 8.0, 0, 4.0, 2.0, 26.0, 640.0, 0.0, 1073741888.0, 624.0, 516.0, 4.0, 3.0, 0, 4319139.0, 0.0, 0, 0.0, 36.0, 8.0, 217088.0, 0.0, 0, 0, 0, 4.0, 5.0, 0, 20.0, 255624.0, 65535.0, 5.10153058443, 396.0, 4319140.0, 552.0, 144.0, 28.0, 5.0, 1048576.0, 217088.0, 350.0, 0.0, 0, 7.0, 1048576.0, 260.0, 0, 116.0, 0, 322.0, 0.0, 0, 4319141.0, 0.0, 10.0, 0.0, 9.0, 4.0, 0, 0, 0, 6.36484131641, 0.0, 0, 11.0, 72.0, 372.0, 45995.0, 217088.0, 0, 4096.0, 12.0, 80.0, 592.0, 264.0, 0, 0, 4096.0, 0.0, 256.0, 0.0, 49152.0, 700.0, 0, 4096.0, 0, 0, 0.0, 336.0, 8.0, 0, 0.0, 0, 4319142.0, 0.0, 60.0, 308.0, 4319143.0, 0, 0, 0, 0, 0, 0.742746270768, 316.0, 420.0, 276.0, 1073741888.0, 0.0, 332.0, 284.0, 0, 1107296320.0, 0.0, 4.0, 13.0, 18.0, 0.0, 632.0, 424.0, 261200.0, 0.0, 299008.0, 0.0, 4096.0, 0, 0.0, 299008.0, 0, 658.0, 0, 4319144.0, 4319145.0, 12.0, 50.0, 292.0, 688.0, 484.0, 70.0, 20.0, 4319146.0, 16.0, 17.0, 0, 0, 0, 0.0, 18.0, 4.0, 330.0, 0.0, 0, 0.0, 42.0, 303104.0, 19.0, 8.0, 20.0, 0.0, 0.0, 544.0, 340.0, 0, 14.0, 0, 209078.0, 0.0, 0.0, 22.0, 0, 209078.0, 0.0, 0.0, 18932.0, 4319147.0, 4.58031739078, 0.0, 376.0, 0.0, 0, 632.0, 4.0, 0, 0, 0, 428.0, 0, 0, 323584.0, 0.0, 24.0, 4.0, 368.0, 12.0, 40.0, 0, 720.0, 4.0, 348.0, 267.0, 20468.0, 32.0, 45995.0, 303104.0, 0.0, 0.0, 0, 0, 224.0, 16.0, 4.0, 44.0, 0.0, 0.0, 444.0, 720.0, 0, 1180.0, 0.0, 16.0, 412.0, 0.0, 4.0, 8462.0, 600.0, 568.0, 16.0, 0, 2.0, 36.0, 0.0, 6.0, 0, 21.0, 0.0, 24.0, 0, 4.0, 652.0, 4319148.0, 92.0, 8.0, 2.0, 0, 0.0, 0, 16.0, 0, 0, 324.0, 4.0, 300.0, 0, 278.0, 400.0, 0, 0.0, 0, 352.0, 0, 0.0, 209078.0, 8.0, 4096.0, 8.0, 36.0, 0.0, 256.0, 268435456.0, 0.0, 48.0, 4319149.0, 6.0, 4319150.0, 0, 416.0, 0, 0, 283.0, 4.0, 0, 0, 0, 8.0, 592.0, 0, 0, 25.0, 0.0, 0, 0, 0.0, 332.0, 212992.0, 540.0, 512.0, 0, 532.0, 20.0, 26.0, 0.0, 0, 52.0, 440.0, 7.0, 488.0, 8.0, 12.0, 0.0, 60.0, 14.0, 3221225536.0, 7.0, 56.0, 432.0, 4.0, 0, 12.0, 0.0, 40.0, 680.0, 16.0, 504.0, 344.0, 576.0, 0.0, 452.0, 266240.0, 290816.0, 578.0, 0, 552.0, 34.0, 0.0, 636.0, 88.0, 698.0, 282.0, 328.0, 38.0, 8.0, 480.0, 64.0, 4319151.0, 0.0, 0.0, 34.0, 460.0, 64.0, 0, 612.0, 0.0, 4319152.0, 0, 604.0, 0, 436.0, 0, 0, 20.0, 0, 4.0, 0, 0, 0, 0, 40.0, 356.0, 584.0, 0, 84.0, 0.0, 0, 0, 0, 294912.0, 7.0, 29.0, 20.0, 0, 60.0, 0.0, 268.0, 536.0, 4319153.0, 0.0, 106.0, 456.0, 24.0, 404.0, 0, 31.0, 0, 380.0, 24.0, 648.0, 0.0, 0, 0, 0.0, 0, 0, 0, 0.0, 0, 0, 0.0, 0.0, 1883.0, 5.85655736551, 34.0, 17744.0, 28680.0, 38.0, 36.0, 0.0, 24576.0, 596.0, 107.0, 33.0, 4.0, 5.0, 0, 0, 45995.0, 384.0, 8.0, 0, 0, 500.0, 20468.0, 34.0, 312.0, 8.0, 660.0, 0.0, 35.0, 608.0, 0, 684.0, 8.0, 68.0, 0.0, 32.0, 34.0, 23117.0, 3.0, 520.0, 0, 4319154.0, 0, 0, 512.0, 8.0, 28.0, 4096.0, 0, 538.0, 0.0, 572.0, 0.0, 2.0, 36.0, 0.0, 0.0, 32.0, 32.0, 4.0, 28.0, 0, 4.0, 38.0, 68.0, 9.0, 0.0, 0, 0.0, 36.0, 39.0, 618.0, 0, 8.0, 266240.0, 4.0, 5.0, 34.0, 304.0, 0, 0.0, 20.0, 40.0, 0.0, 0.0, 0, 580.0, 556.0, 4.0, 8.0, 262.0, 0, 12.0, 32.0, 0, 76.0, 12.0, 184.0, 720.0, 4.0, 16.0, 644.0, 16.0, 28680.0, 4319155.0, 720.0, 0.0, 564.0, 392.0, 672.0, 0.0, 24.0, 492.0, 0, 0.0, 676.0, 0, 0, 0, 12.0, 592.0, 360.0, 8.0, 692.0, 552.0, 4.0, 36.0, 512.0, 7198.0, 42.0, 44.0, 45.0, 4319156.0, 20.0, 388.0, 476.0, 5.0, 36.0, 20480.0, 47.0, 16.0, 326.0, 0.0, 12.0, 0.0, 0.0, 7.0, 272.0, 280.0, 0.0, 0, 288.0, 48.0, 4319157.0, 10.0, 448.0, 4.0, 4.0, 0, 20468.0, 408.0, 2.0, 50.0, 560.0, 0, 1610612768.0, 8.0, 0, 620.0, 656.0, 4.0, 4096.0, 51.0, 0, 0, 0.0, 28.0, 0, 616.0, 0, 296.0, 2.0, 632.0, 468.0, 28.0, 32.0, 52.0, 0, 528.0, 0, 28.0, 0.0, 0, 24.0, 18.0, 4096.0, 0, 8.0, 180.0, 664.0, 4319158.0, 26.0, 0.0, 6.0, 0, 4096.0, 472.0, 0, 28.0, 72.0, 464.0, 672.0, 0, 24.0, 4.0, 0, 28680.0, 0, 0, 18.0, 0, 0, 4319159.0, 24.0, 28.0, 16.0]
我使用Tflearn,试图建立一个分类模型关闭此数据,例如每个条目都有一个0或1的标签,我想训练模型来预测一个未知的条目是0或1。这里是我的代码摘要:
def main():
## Options ##
num_tf_layers = 10 # Number of fully connected layers, ex. softmax layer
num_tf_layer_nodes = 32 # Number of nodes in the fully connected layers
print_test_scores = 1 # Bool to print test set and predictions
use_validation_set = 0 # Bool to use testing set when fitting
num_tf_epochs = 10
tf_batch_size = 1
tf_learn_rate = 0.001
## Opening files
print("Preparing labels...")
trainY = tflearn.data_utils.to_categorical(temp_train_Y, nb_classes=2)
if use_validation_set:
testY = tflearn.data_utils.to_categorical(temp_test_Y, nb_classes=2)
print('Forming input data...')
net = tflearn.input_data(shape=[None, len(trainX[0])])
print('Creating fully connected layers...')
for i in range(num_tf_layers):
net = tflearn.fully_connected(net, num_tf_layer_nodes)
print('Creating softmax layer...')
net = tflearn.fully_connected(net, 2, activation='softmax')
print('Preparing regression...')
net = tflearn.regression(net, learning_rate=tf_learn_rate)
print('Preparing DNN...')
model = tflearn.DNN(net)
print('Fitting...')
if use_validation_set:
model.fit(trainX, trainY, n_epoch=num_tf_epochs, batch_size=tf_batch_size, validation_set=(testX, testY), show_metric=True)
else:
model.fit(trainX, trainY, n_epoch=num_tf_epochs, batch_size=tf_batch_size, show_metric=True)
print('Complete...')
我基于这一关以下TFlearn example。这个程序在一小部分数据,250 0和250 1上运行的很好。我的准确率高达80%,我认为增加更多的数据有助于提高准确度。但是,在添加大量数据之后,NaN的损失会非常快。甚至没有一次通过450,000快速迭代。经过一番研究后,我发现我的学习速度可能太高了,因为我已经把它留给了默认设置。我把它设置在0.1和0.000001之间,没有任何东西能阻止NaN的损失。我也尝试在1到1024之间更改批量,并更改3到20之间的层数。没有任何帮助。有没有人有任何想法改变或如何解决这个问题,以解决它?
谢谢!
消失梯度问题不会导致损失爆炸,而是导致它停滞在一个很差的值。此外,他已经在使用线性激活,因此他不可能真正遭受消失的渐变。 – timleathart
@timleathart由于激活函数的差值小于1(sigmoid和tanh),可能会出现渐变问题。所以,OP有可能使用它。我同意你的第一点,我正在更新我的答案以反映这一点。如果你因此而低估了,我敦促你重新考虑。谢谢。 –
OP尚未指定用于隐藏层的激活函数,因此默认情况下在TFLearn中它们将是线性激活'f(x)= x',而不是sigmoid或tanh。更可能的是,问题在于他的数据点具有非常高的值,即使学习率低,在梯度下降期间也会产生很大的步数。这些应该在训练前按列进行标准化,这是我在datascience.stackexchange.com上就这个问题的原始回答给出的建议。 – timleathart