使用实例密钥进行训练和预测

我能够训练我的模型并使用ML引擎进行预测，但我的结果不包含任何识别信息。在提交一行时一次提交预测，但在提交多行时，这种方法可以正常工作，但我无法将预测连接回原始输入数据。 GCP documentation讨论了使用实例密钥，但我找不到任何使用实例密钥进行训练和预测的示例代码。以GCP人口普查为例，我将如何更新输入函数以通过图表传递一个唯一的ID，并在训练期间忽略它，但返回具有预测的唯一ID？或者，如果有人知道已经在使用密钥的另一个示例，那也可以提供帮助。使用实例密钥进行训练和预测

从Census Estimator Sample

def serving_input_fn(): 
    feature_placeholders = { 
     column.name: tf.placeholder(column.dtype, [None]) 
     for column in INPUT_COLUMNS 
    } 

    features = { 
     key: tf.expand_dims(tensor, -1) 
     for key, tensor in feature_placeholders.items() 
    } 

    return input_fn_utils.InputFnOps(
     features, 
     None, 
     feature_placeholders 
    ) 


def generate_input_fn(filenames, 
        num_epochs=None, 
        shuffle=True, 
        skip_header_lines=0, 
        batch_size=40): 

    def _input_fn(): 
     files = tf.concat([ 
      tf.train.match_filenames_once(filename) 
      for filename in filenames 
     ], axis=0) 

     filename_queue = tf.train.string_input_producer(
      files, num_epochs=num_epochs, shuffle=shuffle) 
     reader = tf.TextLineReader(skip_header_lines=skip_header_lines) 

     _, rows = reader.read_up_to(filename_queue, num_records=batch_size) 

     row_columns = tf.expand_dims(rows, -1) 
     columns = tf.decode_csv(row_columns, record_defaults=CSV_COLUMN_DEFAULTS) 
     features = dict(zip(CSV_COLUMNS, columns)) 

     # Remove unused columns 
     for col in UNUSED_COLUMNS: 
      features.pop(col) 

     if shuffle: 
      features = tf.train.shuffle_batch(
      features, 
      batch_size, 
      capacity=batch_size * 10, 
      min_after_dequeue=batch_size*2 + 1, 
      num_threads=multiprocessing.cpu_count(), 
      enqueue_many=True, 
      allow_smaller_final_batch=True 
      ) 
     label_tensor = parse_label_column(features.pop(LABEL_COLUMN)) 
     return features, label_tensor 

    return _input_fn

更新： 我能够使用建议的代码this answer below我只是需要改变它略微以更新model_fn_ops而不只是预测字典的输出方案。但是，这只有在我的服务输入功能针对类似于this的json输入进行编码时才有效。我的服务输入功能先前是在Census Core Sample中的CSV服务输入功能之后建模的。

我觉得我的问题来自build_standardized_signature_def函数，甚至更多，所以它调用的功能is_classification_problem。使用csv服务函数的输入字典长度为1，因此该逻辑使用classification_signature_def结束，其最终只显示分数（结果实际上是probabilities），而输入字典长度大于1且具有json服务输入功能而是使用包含所有输出的predict_signature_def。

来源

2017-06-06 dobbysock1002

这是ModelServer中的分类标记（CMLE用于推理）中的已知问题。在1.2中，EstimatorSpec允许您选择自己的导出方法，因此希望能够为您解决问题，但是需要重写才能使用tf.estimator.Estimator而不是tf.contrib.learn.Estimator。 –

更新：在1.3的contrib估计（tf.contrib.learn.DNNClassifier例如）版本，改为从核心估计继承class tf.estimator.Estimator与前一版本不同，它将模型函数隐藏为私有类成员，因此您需要在以下解决方案中将estimator.model_fn替换为estimator._model_fn。

乔希的回答指出你的花的例子，这是一个很好的解决方案，如果你想使用自定义估算。如果您想坚持使用罐装估算器（例如tf.contrib.learn.DNNClassifiers），则可以将其包装在自定义估算器中，以添加对键的支持。（注：我认为这可能是罐头估计者在进入核心时将获得关键支持）。

KEY = 'key' 
def key_model_fn_gen(estimator): 
    def _model_fn(features, labels, mode, params): 
     key = features.pop(KEY, None) 
     model_fn_ops = estimator.model_fn(
      features=features, labels=labels, mode=mode, params=params) 
     if key: 
      model_fn_ops.predictions[KEY] = key 
      # This line makes it so the exported SavedModel will also require a key 
      model_fn_ops.output_alternatives[None][1][KEY] = key 
     return model_fn_ops 
    return _model_fn 

my_key_estimator = tf.contrib.learn.Estimator(
    model_fn=key_model_fn_gen(
     tf.contrib.learn.DNNClassifier(model_dir=model_dir...) 
    ), 
    model_dir=model_dir 
)

my_key_estimator然后可以使用完全一样的DNNClassifier将被使用，但它会期望从input_fns（预测，评估和培训）的名称'key'功能。

EDIT2：您还需要将相应的输入张量添加到您选择的预测输入函数中。例如，一个新的JSON服务输入FN将如下所示：

def json_serving_input_fn(): 
    inputs = # ... input_dict as before 
    inputs[KEY] = tf.placeholder([None], dtype=tf.int64) 
    features = # .. feature dict made from input_dict as before 
    tf.contrib.learn.InputFnOps(features, None, inputs)

（1.2和1.3之间略有不同，如tf.contrib.learn.InputFnOps替换tf.estimator.export.ServingInputReceiver，和填充张量，以秩2不再需要在1.3）

然后，ML引擎将发送一个名为“key”的张量，其中包含您的预测请求，该张量将传递给您的模型，并通过您的预测。

编辑3：修改key_model_fn_gen支持忽略缺少的键值。编辑4：添加预测密钥

来源

2017-06-08 18:44:22

我试过实现这个，代码成功，但是当我提交我的预测通过MLE申请我只能回复预测的分数 - 甚至在预测字典中都没有？查看estimator类的predict（）方法，似乎它应该返回model_fn_ops预测字典中的所有内容，当没有指定输出时：https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/learn/python /learn/estimators/estimator.py#L935你知道我需要做些什么额外的修改吗？ – dobbysock1002

我假设你正在使用实验来训练？如果是这样，你使用了什么'export_strategy'？ –

我正在使用该实验。我已经将更多关于出口策略的细节添加到了我原来发布的问题中。 – dobbysock1002

伟大的问题。 Cloud ML引擎flowers sample通过使用tf.identity操作将字符串从输入传递到输出来执行此操作。以下是graph construction期间的相关行。

keys_placeholder = tf.placeholder(tf.string, shape=[None]) 
inputs = { 
    'key': keys_placeholder, 
    'image_bytes': tensors.input_jpeg 
} 

# To extract the id, we need to add the identity function. 
keys = tf.identity(keys_placeholder) 
outputs = { 
    'key': keys, 
    'prediction': tensors.predictions[0], 
    'scores': tensors.predictions[1] 
}

对于批量预测，您需要在您的实例记录中插入“key”：“some_key_value”。对于网上预测你会query上图与像JSON请求：

{'instances' : [ 
    {'key': 'first_key', 'image_bytes' : {'b64': ...}}, 
    {'key': 'second_key', 'image_bytes': {'b64': ...}} 
    ] 
}

来源

2017-06-06 15:02:36 JoshGC

谢谢！我一直在评论花代码，关键似乎是指定输入和输出格式并创建一个签名。如果您正在使用预罐装估计器和实验类，是否仍有办法获得这种定制，还是只有在我自己构建图形时才能工作？实验中的出口策略是否适用于做这个？我还注意到虹膜的例子在列车监视器的论点中做了类似的事情。 [代码链接]（https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/iris/trainer/task.py#L113） – dobbysock1002

使用实例密钥进行训练和预测

回答

相关问题