Keras：用于单热编码的类权重（class_weight）

我想在keras model.fit中使用class_weight参数来处理不平衡的训练数据。通过看一些文件，我明白我们可以通过这样的字典：Keras：用于单热编码的类权重（class_weight）

class_weight = {0 : 1, 
    1: 1, 
    2: 5}

（在这个例子中，2类将得到的损失函数的较高刑罚）

的问题是，我的网络的输出具有单热编码，即类0 =（1,0,0），类1 =（0,1,0）和类3 =（0,0,1）。

我们如何使用class_weight实现单热编码输出？

通过查看some codes in Keras，它看起来像_feed_output_names包含输出类的列表，但对我来说，model.output_names/model._feed_output_names回报['dense_1']

来源

2017-04-18 Naoto Usuyama

我想我们可以用sample_weights代替。实际上，在Keras内部，class_weights转换为sample_weights。

sample_weight：与x相同长度的可选数组，其中包含权重以应用于每个样本的模型损失。在时间数据的情况下，您可以传递具有形状的二维数组（样本， sequence_length），为每个样本的每个时间步应用不同的权重。在这种情况下，您应该确保在compile（）中指定 sample_weight_mode =“temporal”。

https://github.com/fchollet/keras/blob/d89afdfd82e6e27b850d910890f4a4059ddea331/keras/engine/training.py#L1392

来源

2017-04-18 20:27:52

sample_weight_mode =“temporal”如何帮助多级单热编码目标？你有没有想法如何处理每个样本可能需要多个班级的情况？谢谢 – olix20

一个令人费解的答案的一点点，但到目前为止，我已经找到了最好的。这里假设你的数据是一个热编码，多类别，只在标签上的工作数据框df_y：

import pandas as pd 
import numpy as np 

# Create a pd.series that represents the categorical class of each one-hot encoded row 
y_classes = df_y.idxmax(1, skipna=False) 

from sklearn.preprocessing import LabelEncoder 

# Instantiate the label encoder 
le = LabelEncoder() 

# Fit the label encoder to our label series 
le.fit(list(y_classes)) 

# Create integer based labels Series 
y_integers = le.transform(list(y_classes)) 

# Create dict of labels : integer representation 
labels_and_integers = dict(zip(y_classes, y_integers)) 

from sklearn.utils.class_weight import compute_class_weight, compute_sample_weight 

class_weights = compute_class_weight('balanced', np.unique(y_integers), y_integers) 
sample_weights = compute_sample_weight('balanced', y_integers) 

class_weights_dict = dict(zip(le.transform(list(le.classes_)), class_weights))

这导致计算，以平衡它可以传递给Keras sample_weight不平衡数据集sample_weights矢量财产和class_weights_dict可以被馈送到.fit方法中的Keras class_weight财产。你不是真的想要使用两种，只需选择一种。我现在使用class_weight，因为sample_weight与fit_generator一起工作很复杂。

来源

2017-06-30 23:26:40

在_standardize_weights，keras作用：

if y.shape[1] > 1: 
    y_classes = y.argmax(axis=1)

所以基本上，如果你选择使用一个热编码，这些类是列索引。

您也可以问自己如何将列索引映射到原始数据类。那么，如果您使用scikit的LabelEncoder类学习执行单热编码，那么列索引映射.fit函数计算出的unique labels的顺序。的医生说

提取独特标签
的有序阵列

实施例：

from sklearn.preprocessing import LabelBinarizer 
y=[4,1,2,8] 
l=LabelBinarizer() 
y_transformed=l.fit_transorm(y) 
y_transormed 
> array([[0, 0, 1, 0], 
    [1, 0, 0, 0], 
    [0, 1, 0, 0], 
    [0, 0, 0, 1]]) 
l.classes_ 
> array([1, 2, 4, 8])

作为结论，该class_weights字典的键应该反映在classes_属性的顺序的编码器。

来源

2018-01-25 14:09:54 pglaser

Keras：用于单热编码的类权重（class_weight）

回答

相关问题