我有一个包含Spark的Docker镜像。这里是我的Dockerfile:PyCharm覆盖正在用作解释器的码头集装箱中的PYTHONPATH
FROM docker-dev.artifactory.company.com/centos:7.3.1611
# set proxy
ENV http_proxy http://proxyaddr.co.uk:8080
ENV HTTPS_PROXY http://proxyaddr.co.uk:8080
ENV https_proxy http://proxyaddr.co.uk:8080
RUN yum install -y epel-release
RUN yum install -y gcc
RUN yum install -y krb5-devel
RUN yum install -y python-devel
RUN yum install -y krb5-workstation
RUN yum install -y python-setuptools
RUN yum install -y python-pip
RUN yum install -y xmlstarlet
RUN yum install -y wget java-1.8.0-openjdk
RUN pip install kerberos
RUN pip install numpy
RUN pip install pandas
RUN pip install coverage
RUN pip install tensorflow
RUN wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.6.tgz
RUN tar -xvzf spark-1.6.0-bin-hadoop2.6.tgz -C /opt
RUN ln -s spark-1.6.0-bin-hadoop2.6 /opt/spark
ENV VERSION_NUMBER $(cat VERSION)
ENV JAVA_HOME /etc/alternatives/jre/
ENV SPARK_HOME /opt/spark
ENV PYTHONPATH $SPARK_HOME/python/:$PYTHONPATH
ENV PYTHONPATH $SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
我可以建立然后运行搬运工图像,连接到它,并成功导入pyspark库:
$ docker run -d -it sse_spark_build:1.0
09e8aac622d7500e147a6e6db69f806fe093b0399b98605c5da2ff5e0feca07c
$ docker exec -it 09e8aac622d7 python
Python 2.7.5 (default, Nov 6 2016, 00:28:07)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyspark import SparkContext
>>>import os
>>> os.environ['PYTHONPATH']
'/opt/spark/python/lib/py4j-0.9-src.zip:/opt/spark/python/:'
>>>
注PYTHONPATH
价值!
问题是,如果我使用与解释器相同的泊坞窗图像,PyCharm中的行为是不同的。以下是我已经设置了解释:
如果我再在PyCharm运行Python控制台出现这种情况:
bec0b9189066:python /opt/.pycharm_helpers/pydev/pydevconsole.py 0 0
PyDev console: starting.
import sys; print('Python %s on %s' % (sys.version, sys.platform))
sys.path.extend(['/home/cengadmin/git/dhgitlab/sse/engine/fs/programs/pyspark', '/home/cengadmin/git/dhgitlab/sse/engine/fs/programs/pyspark'])
Python 2.7.5 (default, Nov 6 2016, 00:28:07)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
import os
os.environ['PYTHONPATH']
'/opt/.pycharm_helpers/pydev'
正如你可以看到PyCharm改变PYTHONPATH这意味着我再也不能使用我想使用pyspark库:
from pyspark import SparkContext
Traceback (most recent call last):
File "<input>", line 1, in <module>
ImportError: No module named pyspark
OK,我可以从控制台更改路径,使其工作:
import sys
sys.path.append('/opt/spark/python/')
sys.path.append('/opt/spark/python/lib/py4j-0.9-src.zip')
但每次打开控制台时都必须这么做。我不相信没有办法告诉PyCharm追加到PYTHONPATH而不是覆盖它,但如果有的话我找不到它。任何人都可以提供建议吗?我如何使用Docker镜像作为PyCharm的远程解释器并保留PYTHONPATH的值?
哦非常酷,非常感谢你 – jamiet