2017-08-28 72 views
2

我想设置一个2机器火花集群。 我在Windows主机上使用两个VirtualBox Ubutnu 16.04访客机器进行设置。 Master是Windows 10主机上的Ubuntu 16.04 Guest,而Slave是Windows 7 Host上的Ubuntu 16.04 Guest。火花主控连接正在关闭一个请求待定从火花从属

我也做了以下几件事:在两台机器上

  • 设置密码的ssh少。
  • 安装Java和星火两种机器上
  • 在两台机器上安装路径变量
  • 添加这两种机器的IP到/ etc /主机文件在两台机器上
  • 添加SPARK_MASTER_IP =“master_ip”上从机的conf /火花-env.sh文件

现在,当我启动主人时,它启动正确。我可以在master_ip:8080上访问Spark Master的Web UI。

但是,当我试图通过使用sudo ./start-slave.sh master_ip:8080发生以下情况,开始从子机的从节点:

奴隶工人开始,我可以访问它在slave_ip网页介面:8081,但是从工人无法连接到主服务器,并且未显示在Spark Master Web UI上,并且在工作日志文件中出现以下错误:

Slave Log 1我无法发布两个以上的链接,我无法发布完整的从服务器日志

使用nc -v ip port命令成功

  • 密码少ssh来从两台机器
  • 平到两个机器的端口:

    从登录

    Spark Command: /usr/lib/jvm/java-8-oracle/jre//bin/java -cp /home/clusterslave/spark/spark-2.2.0-bin-hadoop2.7/conf/:/home/clusterslave/spark$ 
        ======================================== 
        Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
        17/08/29 00:41:32 INFO Worker: Started daemon with process name: [email protected] 
        17/08/29 00:41:32 INFO SignalUtils: Registered signal handler for TERM 
        17/08/29 00:41:32 INFO SignalUtils: Registered signal handler for HUP 
        17/08/29 00:41:32 INFO SignalUtils: Registered signal handler for INT 
        17/08/29 00:41:32 WARN Utils: Your hostname, clusterslave-VirtualBox resolves to a loopback address: 127.0.0.1; using master_ip instead ($ 
        17/08/29 00:41:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
        17/08/29 00:41:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
        17/08/29 00:41:33 INFO SecurityManager: Changing view acls to: root 
        17/08/29 00:41:33 INFO SecurityManager: Changing modify acls to: root 
        17/08/29 00:41:33 INFO SecurityManager: Changing view acls groups to: 
        17/08/29 00:41:33 INFO SecurityManager: Changing modify acls groups to: 
        17/08/29 00:41:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); $ 
        17/08/29 00:41:34 INFO Utils: Successfully started service 'sparkWorker' on port 38929. 
        17/08/29 00:41:34 INFO Worker: Starting Spark worker slave_ip:38929 with 4 cores, 5.8 GB RAM 
        17/08/29 00:41:34 INFO Worker: Running Spark version 2.2.0 
        17/08/29 00:41:34 INFO Worker: Spark home: /home/clusterslave/spark/spark-2.2.0-bin-hadoop2.7 
        17/08/29 00:41:34 INFO Utils: Successfully started service 'WorkerUI' on port 8081. 
        17/08/29 00:41:34 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://slave_ip:8081 
        17/08/29 00:41:34 INFO Worker: Connecting to master master_ip:8080... 
        17/08/29 00:41:34 INFO TransportClientFactory: Successfully created connection to /master_ip:8080 after 105 ms (0 ms spent in bootstraps) 
        17/08/29 00:41:34 ERROR TransportResponseHandler: Still have 1 requests outstanding when connection from /master_ip:8080 is closed 
        17/08/29 00:41:34 WARN Worker: Failed to connect to master master_ip:8080 
        org.apache.spark.SparkException: Exception thrown in awaitResult: 
          at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205) 
          at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) 
          at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100) 
          at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108) 
          at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.s$ 
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
          at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
          at java.lang.Thread.run(Thread.java:748) 
        Caused by: java.io.IOException: Connection from /master_ip:8080 closed 
          at org.apache.spark.network.client.TransportResponseHandler.channelInactive(TransportResponseHandler.java:146) 
          at org.apache.spark.network.server.TransportChannelHandler.channelInactive(TransportChannelHandler.java:108) 
          at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) 
          at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) 
          at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) 
          at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) 
          at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:278) 
          at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) 
          at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) 
          at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) 
          at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) 
          at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) 
          at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) 
          at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) 
          at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75) 
          at org.apache.spark.network.util.TransportFrameDecoder.channelInactive(TransportFrameDecoder.java:182) 
          at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) 
          at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) 
          at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:220) 
          at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1289) 
          at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:241) 
          at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:227) 
          at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:893) 
          at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:691) 
          at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) 
          at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) 
          at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) 
          at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) 
          ... 1 more 
    17/08/29 00:41:42 INFO Worker: Retrying connection to master (attempt # 1) 
    

    下工作正常

    • 由从到主:Connecting to master_ip:8080 port [tcp/http-alt] succeeded!
    • 由主到从:Connecting to slave_ip:8081 port [tcp/tproxy] succeeded!

我也曾尝试同时在机器跌落防火墙。仍然问题仍然存在

请帮我解决这个问题。谢谢。

+0

星火主控主机的Windows 10和Spark从主机的Windows 7虽然两者都是在客户操作系统中运行的Ubuntu 16.04在VirtualBox中 –

+0

您应该在问题正文中发布相关日志文本。非现场链接将腐烂使这个问题对其他人不太有用。 – jdv

+0

请不要使用评论更新问题。以这种方式编辑问题主体并添加其他信息。 – jdv

回答

0

你应该允许两台机器之间的所有tcp网络,禁用防火墙,并尝试没有DNS名称只是直接给出IP地址,我会建议尝试在同一主机上的前两个虚拟机,排除网络问题,顺便说一句,为什么不你在码头上试试吗?

为您的使用情况下,按照此步骤:

在主 斌/火花级org.apache.spark.deploy.master.Master --ip 10.10.10.01 像链接https://github.com/2dmitrypavlov/sparkDocker/blob/master/master_ip.sh

然后在主设备上启动从设备,因为您要使用其资源,请确保在一个VM从设备上可以连接它。 斌/火花级org.apache.spark.deploy.worker.Worker火花://10.10.10.01:7077 --webui端口8081

后,它的工作原理做同样的第二VM:

bin/spark-class org.apache.spark.deploy.worker.Worker spark://10.10.10.01:7077 --webui-port 8081

在这个时候走一步,这样会更容易发现问题。 如果您决定使用码头工具,这里是带有说明的图片https://github.com/2dmitrypavlov/sparkDocker

+0

谢谢你的回应先生。如何允许两台机器之间的所有TCP网络? –

+0

我试图在https://github.com/2dmitrypavlov/sparkDocker上使用docker镜像,但是在执行slave命令后,我没有看到任何从站连接到主站。 –

+0

我试着用master_ip替换10.10.10.01之后提到的命令,我得到了相同的结果。同样的错误 –

0

您需要在spark-env.sh文件中主机和从机的出口SPARK_MASTER_HOST=(master ip)代替SPARK_MASTER_IP,还出口SPARK_LOCAL_IP两个