训练的模型的“无可用版本”消息我已经使用Google Cloud ML引擎的入门教程作为参考训练了模型。我可以设法在Google Cloud ML上部署和投放此模型,而不会出现任何问题。Tensorflow Serving - 使用tf.contrib.learn.Experiment
现在我想要使用Tensorflow服务来服务,但我得到以下信息错误:
2017-03-17 19:20:17.064146: W tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:204] No versions of servable default found under base path /serving/tf_models/extrato/output/
我使用开始日Tensorflow服务的命令行调用是:
[email protected]:/serving# bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_base_path=/serving/tf_models/extrato/output/
输出文件夹的内容是:
[email protected]:/serving# ls -la tf_models/extrato/output
total 119740
drwxr-xr-x 4 root root 4096 Mar 17 17:02 .
drwxr-xr-x 3 root root 4096 Mar 17 17:02 ..
-rw-r--r-- 1 root root 184 Mar 17 17:02 checkpoint
drwxr-xr-x 2 root root 4096 Mar 17 17:02 eval
-rw-r--r-- 1 root root 96390060 Mar 17 17:02 events.out.tfevents.1489705843.elio-MS-7A66
drwxr-xr-x 3 root root 4096 Mar 17 17:02 export
-rw-r--r-- 1 root root 1362798 Mar 17 17:02 graph.pbtxt
-rw-r--r-- 1 root root 7633781 Mar 17 17:02 model.ckpt-1000001.data-00000-of-00001
-rw-r--r-- 1 root root 1975 Mar 17 17:02 model.ckpt-1000001.index
-rw-r--r-- 1 root root 637623 Mar 17 17:02 model.ckpt-1000001.meta
-rw-r--r-- 1 root root 7633781 Mar 17 17:02 model.ckpt-2.data-00000-of-00001
-rw-r--r-- 1 root root 1975 Mar 17 17:02 model.ckpt-2.index
-rw-r--r-- 1 root root 637623 Mar 17 17:02 model.ckpt-2.meta
-rw-r--r-- 1 root root 7633781 Mar 17 17:02 model.ckpt-566170.data-00000-of-00001
-rw-r--r-- 1 root root 1975 Mar 17 17:02 model.ckpt-566170.index
-rw-r--r-- 1 root root 637623 Mar 17 17:02 model.ckpt-566170.meta
ù PDATE:我尝试使用冻结模型(.pb文件和变量文件夹),这实际上是我用来在Google Cloud ML Engine上部署模型的文件夹,但得到了相同的错误消息。
这些文件位于下面的文件夹中:
[email protected]:/serving# ls -la tf_models/extrato/output/export/Servo/1489706933289/
total 356
drwxr-xr-x 3 root root 4096 Mar 17 17:02 .
drwxr-xr-x 3 root root 4096 Mar 17 17:02 ..
-rw-r--r-- 1 root root 348848 Mar 17 17:02 saved_model.pb
drwxr-xr-x 2 root root 4096 Mar 17 17:02 variables
我用来训练和导出模型的代码是:
import argparse
import model
import tensorflow as tf
from tensorflow.contrib.learn.python.learn import learn_runner
from tensorflow.contrib.learn.python.learn.utils import (
saved_model_export_utils)
def generate_experiment_fn(train_files,
eval_files,
num_epochs=None,
train_batch_size=40,
eval_batch_size=40,
embedding_size=8,
first_layer_size=100,
num_layers=4,
scale_factor=0.7,
**experiment_args):
"""Create an experiment function given hyperparameters.
See command line help text for description of args.
Returns:
A function (output_dir) -> Experiment where output_dir is a string
representing the location of summaries, checkpoints, and exports.
this function is used by learn_runner to create an Experiment which
executes model code provided in the form of an Estimator and
input functions.
All listed arguments in the outer function are used to create an
Estimator, and input functions (training, evaluation, serving).
Unlisted args are passed through to Experiment.
"""
# Check verbose logging flag
verbose_logging = experiment_args.pop('verbose_logging')
model.set_verbose_logging(verbose_logging)
def _experiment_fn(output_dir):
# num_epochs can control duration if train_steps isn't
# passed to Experiment
train_input = model.generate_input_fn(
train_files,
num_epochs=num_epochs,
batch_size=train_batch_size,
)
# Don't shuffle evaluation data
eval_input = model.generate_input_fn(
eval_files,
batch_size=eval_batch_size,
shuffle=False
)
return tf.contrib.learn.Experiment(
model.build_estimator(
output_dir,
embedding_size=embedding_size,
# Construct layers sizes with exponetial decay
hidden_units=[
max(2, int(first_layer_size * scale_factor**i))
for i in range(num_layers)
]
),
train_input_fn=train_input,
eval_input_fn=eval_input,
# export strategies control the prediction graph structure
# of exported binaries.
export_strategies=[saved_model_export_utils.make_export_strategy(
model.serving_input_fn,
default_output_alternative_key=None,
exports_to_keep=1
)],
**experiment_args
)
return _experiment_fn
if __name__ == '__main__':
parser = argparse.ArgumentParser()
# Input Arguments
parser.add_argument(
'--train-files',
help='GCS or local paths to training data',
nargs='+',
required=True
)
parser.add_argument(
'--num-epochs',
help="""\
Maximum number of training data epochs on which to train.
If both --max-steps and --num-epochs are specified,
the training job will run for --max-steps or --num-epochs,
whichever occurs first. If unspecified will run for --max-steps.\
""",
type=int,
)
parser.add_argument(
'--train-batch-size',
help='Batch size for training steps',
type=int,
default=40
)
parser.add_argument(
'--eval-batch-size',
help='Batch size for evaluation steps',
type=int,
default=40
)
parser.add_argument(
'--train-steps',
help="""\
Steps to run the training job for. If --num-epochs is not specified,
this must be. Otherwise the training job will run indefinitely.\
""",
type=int
)
parser.add_argument(
'--eval-steps',
help='Number of steps to run evalution for at each checkpoint',
default=100,
type=int
)
parser.add_argument(
'--eval-files',
help='GCS or local paths to evaluation data',
nargs='+',
required=True
)
# Training arguments
parser.add_argument(
'--embedding-size',
help='Number of embedding dimensions for categorical columns',
default=8,
type=int
)
parser.add_argument(
'--first-layer-size',
help='Number of nodes in the first layer of the DNN',
default=100,
type=int
)
parser.add_argument(
'--num-layers',
help='Number of layers in the DNN',
default=4,
type=int
)
parser.add_argument(
'--scale-factor',
help='How quickly should the size of the layers in the DNN decay',
default=0.7,
type=float
)
parser.add_argument(
'--job-dir',
help='GCS location to write checkpoints and export models',
required=True
)
# Argument to turn on all logging
parser.add_argument(
'--verbose-logging',
default=False,
type=bool,
help='Switch to turn on or off verbose logging and warnings'
)
# Experiment arguments
parser.add_argument(
'--eval-delay-secs',
help='How long to wait before running first evaluation',
default=10,
type=int
)
parser.add_argument(
'--min-eval-frequency',
help='Minimum number of training steps between evaluations',
default=1,
type=int
)
args = parser.parse_args()
arguments = args.__dict__
job_dir = arguments.pop('job_dir')
print('Starting Census: Please lauch tensorboard to see results: tensorboard --logdir=$MODEL_DIR')
# Run the training job
# learn_runner pulls configuration information from environment
# variables using tf.learn.RunConfig and uses this configuration
# to conditionally execute Experiment, or param server code
learn_runner.run(generate_experiment_fn(**arguments), job_dir)
有没有人有什么我任何提示米做错了吗?
最好的问候!
我也尝试提供位于导出文件夹中的冻结模型(.pb文件和变量文件夹)。但是我仍然没有成功,并且显示了同样的错误信息。我用这些信息更新了这个问题。 – ElioMarcolino
您能澄清您是否尝试过--model_base_path =/serving/tf_models/extrato/output/Servo [注意最后的子目录]。 – rhaertel80
是的,我试过.pb文件和变量目录的完整路径。在我的情况下,它是--model_base_path =/serving/tf_models/extrato/output/export/Servo/1489706933289/ – ElioMarcolino