2017-10-18 51 views
0

我设置了一个火花纱线集群环境,并尝试火花SQL火花壳:saveAsTable两端

spark-shell --master yarn --deploy-mode client --conf spark.yarn.archive=hdfs://hadoop_273_namenode_ip:namenode_port/spark-archive.zip 

有一点要提的是Spark是在Windows 7.在火花壳成功启动,我执行的命令如下:

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) 
scala> val df_mysql_address = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://mysql_db_ip/db").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "ADDRESS").option("user", "root").option("password", "root").load() 
scala> df_mysql_address.show 
scala> df_mysql_address.write.format("parquet").saveAsTable("address_local") 

“显示”命令返回正确结果集,但在故障“saveAsTable”结束。该错误消息说:

java.io.IOException: Mkdirs failed to create file:/C:/jshen.workspace/programs/spark-2.2.0-bin-hadoop2.7/spark-warehouse/address_local/_temporary/0/_temporary/attempt_20171018104423_0001_m_000000_0 (exists=false, cwd=file:/tmp/hadoop/nm-local-dir/usercache/hduser/appcache/application_1508319604173_0005/container_1508319604173_0005_01_000003) 

我期待和猜测的表被保存在Hadoop集群中,但你可以看到目录(C:/jshen.workspace/programs/spark-2.2.0- bin-hadoop2.7/spark-warehouse)是我Windows 7中的文件夹,不在hdfs中,甚至在hadoop ubuntu机器中都没有。

我该怎么办?请指教,谢谢。

+1

您是否尝试将HDFS的绝对路径提供给SaveAsTable?像'saveAsTable(“hdfs:// nn1/user/cloudera/address_local”)' – philantrovert

+0

感谢@philantrovert,受到您的建议的启发。我找出了正确的方法,即在_“save”_操作之前提供_“path”_选项: _scala> df_mysql_address.write..option(“path”,“/ spark-warehouse”) .format( “实木复合地板”)。saveAsTable( “address_local”)_ –

回答

0

的方式摆脱这一问题的是提供前“路径”选项,将“保存”操作,如下图所示:

scala> df_mysql_address.write.option("path", "/spark-warehouse").format("parquet").saveAsTable("address_l‌​ocal") 

感谢@philantrovert。