我的Spark集群拒绝同时运行两个以上的作业。三者中的一个将永远停留在“接受”状态。集群模式SPARK拒绝同时运行两个以上的作业
硬件
4 Data Node with spark clients, 24gb ram, 4processors
集群度量显示应该有足够的核心
Apps Submitted 3
Apps Pending 1
Apps Running 2
Apps Completed 0
Containers Running 4
Memory Used 8GB
Memory Total 32GB
Memory Reserved 0B
VCores Used 4
VCores Total 8
VCores Reserved 0
Active Nodes 2
Decommissioned Nodes 0
Lost Nodes 0
Unhealthy Nodes 0
Rebooted Nodes 0
在应用程序管理器,你可以看到最终的运行第三方应用程序的唯一方法是杀死正在运行的一个
application_1504018580976_0002 adm com.x.app1 SPARK default 0 [date] N/A RUNNING UNDEFINED 2 2 5120 25.0 25.0
application_1500031233020_0090 adm com.x.app2 SPARK default 0 [date] N/A RUNNING UNDEFINED 2 2 3072 25.0 25.0
application_1504024737012_0001 adm com.x.app3 SPARK default 0 [date] N/A ACCEPTED UNDEFINED 0 0 0 0.0 0.0
正在运行的应用程序有2个容器和2x分配的核心,25%的队列和集群的25%。
所有3个应用程序的部署命令。
/usr/hdp/current/spark2-client/bin/spark-submit
--master yarn
--deploy-mode cluster
--driver-cores 1
--driver-memory 512m
--num-executors 1
--executor-cores 1
--executor-memory 1G
--class com..x.appx ../lib/foo.jar
容量调度
yarn.scheduler.capacity.default.minimum-user-limit-percent = 100
yarn.scheduler.capacity.maximum-am-resource-percent = 0.2
yarn.scheduler.capacity.maximum-applications = 10000
yarn.scheduler.capacity.node-locality-delay = 40
yarn.scheduler.capacity.root.accessible-node-labels = *
yarn.scheduler.capacity.root.acl_administer_queue = *
yarn.scheduler.capacity.root.capacity = 100
yarn.scheduler.capacity.root.default.acl_administer_jobs = *
yarn.scheduler.capacity.root.default.acl_submit_applications = *
yarn.scheduler.capacity.root.default.capacity = 100
yarn.scheduler.capacity.root.default.maximum-capacity = 100
yarn.scheduler.capacity.root.default.state = RUNNING
yarn.scheduler.capacity.root.default.user-limit-factor = 1
yarn.scheduler.capacity.root.queues = default
你说4个数据节点。但有些错误。它说:VCores总数8 和:有源节点2 – Venkat
嗨Venkat,我有(6盒>> 4盒火花客户端)。每个盒子有4个核心。那么我错过了什么呢? –