2017-10-20 114 views
2

这是我的导入代码谷歌云数据流无法导入“google.cloud.datastore”

from __future__ import absolute_import 

import datetime 
import json 
import logging 
import re 

import apache_beam as beam 
from apache_beam import combiners 
from apache_beam.io.gcp.bigquery import parse_table_schema_from_json 
from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore 
from apache_beam.pvalue import AsDict 
from apache_beam.pvalue import AsSingleton 
from apache_beam.options.pipeline_options import PipelineOptions 

from google.cloud.proto.datastore.v1 import query_pb2 
from google.cloud import datastore 
from googledatastore import helper as datastore_helper, PropertyFilter 

# datastore entities that we need to perform the mapping computations 
#from models import UserPlan, UploadIntervalCount, RollingMonthlyCount 

这是我的requirements.txt文件看起来像

$ cat requirements.txt 
Flask==0.12.2 
apache-beam[gcp]==2.1.1 
gunicorn==19.7.1 
google-cloud-dataflow==2.1.1 
six==1.10.0 
google-cloud-datastore==1.3.0 
google-cloud 

这是所有/lib目录。该/lib目录有以下

$ ls -1 lib/google/cloud 
__init__.py 
_helpers.py 
_helpers.pyc 
_http.py 
_http.pyc 
_testing.py 
_testing.pyc 
bigquery 
bigtable 
client.py 
client.pyc 
datastore 
dns 
environment_vars.py 
environment_vars.pyc 
error_reporting 
exceptions.py 
exceptions.pyc 
gapic 
iam.py 
iam.pyc 
language 
language_v1 
language_v1beta2 
logging 
monitoring 
obselete.py 
obselete.pyc 
operation.py 
operation.pyc 
proto 
pubsub 
resource_manager 
runtimeconfig 
spanner 
speech 
speech_v1 
storage 
translate.py 
translate.pyc 
translate_v2 
videointelligence.py 
videointelligence.pyc 
videointelligence_v1beta1 
vision 
vision_v1 

注意两个google.cloud.datastoregoogle.cloud.proto/lib文件夹中。然而,这种进口线路工作正常

from google.cloud.proto.datastore.v1 import query_pb2

但是这一次失败

from google.cloud import datastore

这是例外(从谷歌的云计算数据流控制台采取在线)

(9b49615f4d91c1fb): Traceback (most recent call last): 
    File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work 
    work_executor.execute() 
    File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 166, in execute 
    op.start() 
    File "apache_beam/runners/worker/operations.py", line 294, in apache_beam.runners.worker.operations.DoOperation.start (apache_beam/runners/worker/operations.c:10607) 
    def start(self): 
    File "apache_beam/runners/worker/operations.py", line 295, in apache_beam.runners.worker.operations.DoOperation.start (apache_beam/runners/worker/operations.c:10501) 
    with self.scoped_start_state: 
    File "apache_beam/runners/worker/operations.py", line 300, in apache_beam.runners.worker.operations.DoOperation.start (apache_beam/runners/worker/operations.c:9702) 
    pickler.loads(self.spec.serialized_fn)) 
    File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 225, in loads 
    return dill.loads(s) 
    File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 277, in loads 
    return load(file) 
    File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 266, in load 
    obj = pik.load() 
    File "/usr/lib/python2.7/pickle.py", line 858, in load 
    dispatch[key](self) 
    File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce 
    value = func(*args) 
    File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in _import_module 
    return getattr(__import__(module, None, None, [obj]), obj) 
    File "/usr/local/lib/python2.7/dist-packages/dataflow_pipeline/counters_pipeline.py", line 25, in <module> 
    from google.cloud import datastore 
ImportError: No module named datastore 

为什么找不到包装?

+1

必须在'setup.py'中安装外部依赖关系,并且应该在管道参数中指定此文件。 –

+0

是google.cloud的外部依赖?是什么使它不同于安装在/ lib目录中的google.cloud.proto.v1.datastore依赖项,我无法访问?无论如何,我会试试看。我只需要将'google-cloud'添加到setup.py中的'REQUIRED_PACKAGES'列表中? –

+0

嘿@MarcinZablocki,你的回答是正确的,非常感谢你!我仍然困惑,为什么有些东西可以进入/ lib目录,并在appengine_config.py文件中链接,但其他东西必须在setup.py文件中。也许你可以扩展它,并把它写成一个完整的答案? –

回答

2

必须在setup.py中安装外部依赖关系,并且该文件应在管道参数中指定为--setup_file

REQUIRED_PACKAGES = ["google-cloud-datastore==1.3.0"] 

你之所以需要指定它在setup.py是因为: 在setup.py打包成REQUIRED_PACKAGES您可以安装使用自定义命令

pip install google-cloud-datastore==1.3.0 

或通过添加你包在DataFlow执行过程中不使用您在appengine_config中使用的库。 App Engine仅充当调度程序,仅将作业部署到DataFlow引擎。然后,DataFlow会创建一些工作机器来执行您的管道 - 这些工作人员不会通过任何方式连接到App Engine。 DataFlow工作人员必须拥有管道执行所需的每个软件包,这就是为什么您需要在setup.py文件中指定所需软件包的原因。 DataFlow工作人员使用此文件“自行设置”。