我试图训练神经网络来学习函数y = x1 + x2 + x3
。目标是与Caffe一起玩,以便更好地学习和理解它。所需的数据是在python中合成生成的,并作为lmdb数据库文件写入内存。Caffe:在学习简单的线性函数时损失极高
数据生成代码:
import numpy as np
import lmdb
import caffe
Ntrain = 100
Ntest = 20
K = 3
H = 1
W = 1
Xtrain = np.random.randint(0,1000, size = (Ntrain,K,H,W))
Xtest = np.random.randint(0,1000, size = (Ntest,K,H,W))
ytrain = Xtrain[:,0,0,0] + Xtrain[:,1,0,0] + Xtrain[:,2,0,0]
ytest = Xtest[:,0,0,0] + Xtest[:,1,0,0] + Xtest[:,2,0,0]
env = lmdb.open('expt/expt_train')
for i in range(Ntrain):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = Xtrain.shape[1]
datum.height = Xtrain.shape[2]
datum.width = Xtrain.shape[3]
datum.data = Xtrain[i].tobytes()
datum.label = int(ytrain[i])
str_id = '{:08}'.format(i)
with env.begin(write=True) as txn:
txn.put(str_id.encode('ascii'), datum.SerializeToString())
env = lmdb.open('expt/expt_test')
for i in range(Ntest):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = Xtest.shape[1]
datum.height = Xtest.shape[2]
datum.width = Xtest.shape[3]
datum.data = Xtest[i].tobytes()
datum.label = int(ytest[i])
str_id = '{:08}'.format(i)
with env.begin(write=True) as txn:
txn.put(str_id.encode('ascii'), datum.SerializeToString())
Solver.prototext文件:
net: "expt/expt.prototxt"
display: 1
max_iter: 200
test_iter: 20
test_interval: 100
base_lr: 0.000001
momentum: 0.9
# weight_decay: 0.0005
lr_policy: "inv"
# gamma: 0.5
# stepsize: 10
# power: 0.75
snapshot_prefix: "expt/expt"
snapshot_diff: true
solver_mode: CPU
solver_type: SGD
debug_info: true
来自Caffe型号:
name: "expt"
layer {
name: "Expt_Data_Train"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
data_param {
source: "expt/expt_train"
backend: LMDB
batch_size: 1
}
}
layer {
name: "Expt_Data_Validate"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
data_param {
source: "expt/expt_test"
backend: LMDB
batch_size: 1
}
}
layer {
name: "IP"
type: "InnerProduct"
bottom: "data"
top: "ip"
inner_product_param {
num_output: 1
weight_filler {
type: 'constant'
}
bias_filler {
type: 'constant'
}
}
}
layer {
name: "Loss"
type: "EuclideanLoss"
bottom: "ip"
bottom: "label"
top: "loss"
}
上我得到了测试数据的损失233,655
。这是令人震惊的,因为损失比训练和测试数据集中的数字大三个数量级。另外,要学习的功能是简单的线性函数。我似乎无法弄清楚代码中的错误。任何建议/投入都非常感谢。