我使用的火花提交脚本到我的python脚本上传到星火集群,但我得到以下错误:Bluemix:Apache的星火:用于配置驱动程序内存火花提交
Traceback (most recent call last):
File "/gpfs/fs01/user/sf6d-7c3a9c08343577-05540e1c503a/data/workdir/spark-driver-cc30d6d8-1518-45b1-a4a7-8421deaa3482/2_do_extract.py", line 139, in do_extraction
r = resRDD.collect()
File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/lib/pyspark.zip/pyspark/traceback_utils.py", line 78, in __exit__
self._context._jsc.setCallSite(None)
File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 811, in __call__
answer = self.gateway_client.send_command(command)
File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 624, in send_command
connection = self._get_connection()
File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 579, in _get_connection
connection = self._create_connection()
File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 585, in _create_connection
connection.start()
File "/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 697, in start
raise Py4JNetworkError(msg, e)
Py4JNetworkError: An error occurred while trying to connect to the Java server
>
我敢肯定由于执行脚本时缺少驱动程序内存,导致发生此错误,因为对于较小的数据集,脚本会成功执行,而对于较大的数据集,则会出现此错误。
读火花提交的文件我已经尝试了所有的配置,增加驾驶员记忆,执行内存等类似下面
/bin/sh spark-submit.sh --vcap vcap.json my_python_script.py --master https://169.54.219.20 --deploy-mode cluster --driver-memory 5g --executor-memory 5g --driver-maxResultSize 5g --worker-memory 5g
但它似乎是不可能改变的记忆。
请向我解释如何设置这些变量,因为即使是中等程度的内存usuage也会失败。