2017-07-25 56 views
1

所以,我明白规范化对于训练神经网络很重要。规范Keras中的神经网络的验证集合

我也明白,我必须正常化validation-和测试设置与训练组的参数(例如见这个讨论:https://stats.stackexchange.com/questions/77350/perform-feature-normalization-before-or-within-model-validation

我的问题是:如何做到这一点的Keras?

什么我目前做的是:

import numpy as np 
from keras.models import Sequential 
from keras.layers import Dense 
from keras.callbacks import EarlyStopping 

def Normalize(data): 
    mean_data = np.mean(data) 
    std_data = np.std(data) 
    norm_data = (data-mean_data)/std_data 
    return norm_data 

input_data, targets = np.loadtxt(fname='data', delimiter=';') 
norm_input = Normalize(input_data) 

model = Sequential() 
model.add(Dense(25, input_dim=20, activation='relu')) 
model.add(Dense(1, activation='sigmoid')) 

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) 

early_stopping = EarlyStopping(monitor='val_acc', patience=50) 
model.fit(norm_input, targets, validation_split=0.2, batch_size=15, callbacks=[early_stopping], verbose=1) 

但在这里,我首先正常化数据w.r.t.整个数据集和,然后拆分验证集,这是错误的根据上述讨论。

保存来自训练集(training_mean和training_std)的平均值和标准偏差并不是什么大问题,但我怎样才能将training_mean和training_std分别应用于验证集的归一化?

回答

0

在使用sklearn.model_selection.train_test_split拟合模型之前,您可以手动将数据拆分为训练和测试数据集。之后,分别对训练和测试数据进行标准化处理,并使用validation_data参数呼叫model.fit

代码示例

import numpy as np 
from sklearn.model_selection import train_test_split 

data = np.random.randint(0,100,200).reshape(20,10) 
target = np.random.randint(0,1,20) 

X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2) 

X_train = Normalize(X_train) 
X_test = Normalize(X_test) 

model.fit(X_train, y_train, validation_data=(X_test,y_test), batch_size=15, callbacks=[early_stopping], verbose=1) 
0

以下代码使你想要什么:

import numpy as np 
def normalize(x_train, x_test): 
    mu = np.mean(x_train, axis=0) 
    std = np.std(x_train, axis=0) 
    x_train_normalized = (x_train - mu)/std 
    x_test_normalized = (x_test - mu)/std 
    return x_train_normalized, x_test_normalized 

然后你可以用keras这样使用它:

from keras.datasets import boston_housing 
(x_train, y_train), (x_test, y_test) = boston_housing.load_data() 
x_train, x_test = normalize(x_train, x_test) 

丰益的答案是不正确的。