2017-08-13 275 views
0

我正在面对Tez上的Hive问题。无法在Tez上执行来自Hive的MapReduce作业

我可以选择一个表上的蜂巢存在没有任何问题

SELECT * FROM Transactions;

但是当试图在这个表使用聚合函数或计数(*),如:

SELECT COUNT(*) FROM Transactions;

我在下面登录Hive.log文件

2017-08-13T10:04:27,892 INFO [4a5b6a0c-9edb-45ea-8d49-b2f4b0d2b636 main] conf.HiveConf:使用传入的缺省值for log id:4a5b6a0c-9edb-45ea-8d49-b2f4b0d2b636 2017-08 -13T10:04:27,910 INFO [4a5b6a0c-9edb-45ea-8d49-b2f4b0d2b636 main] session.SessionState:关闭tez会话时出错 java.lang.RuntimeException:java.util.concurrent.ExecutionException:org.apache.tez.dag。 api.SessionNotRunning:TezSession已经关闭。 appattempt_1498057873641_0017_000002由于AM容器失败而失败2次exitCode:-1000 失败此尝试。诊断:java.io.FileNotFoundException:文件/ tmp/hadoop-hadoop/nm-local-dir/filecache不存在 For更详细的输出,请检查应用程序跟踪页面:http://hadoop-master:8090/cluster/app/application_1498057873641_0017然后点击指向每次尝试日志的链接。 。申请失败。 at org.apache.hadoop。org.apache.hadoop.hive.ql.exec.tez.TezSessionState.isOpen(TezSessionState.java:173)〜[hive-exec-2.1.1.jar:2.1.1] 。 hive.ql.exec.tez.TezSessionState.toString(TezSessionState.java:135)〜[hive-exec-2.1.1.jar:2.1.1] at java.lang.String.valueOf(String.java:2994) 〜[?:1.8.0_131] at java.lang.StringBuilder.append(StringBuilder.java:131)〜[?:1.8.0_131] at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager。 closeIfNotDefault(TezSessionPoolManager.java:346)〜[hive-exec-2.1.1.jar:2.1.1] at org.apache.hadoop.hive.ql.session.SessionState.close(SessionState.java:1524)[hive -exec-2.1.1.jar:2.1.1] at org.apache.hadoop.hive.cli.CliSessionState.close(CliSessionState.java:66)[hive-cli-2.1.1.jar:2.1.1] at org.apache.had oop.hive.cli.CliDriver.processCmd(CliDriver.java:133)[hive-cli-2.1.1.jar:2.1.1] at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java :399)[hive-cli-2.1.1.jar:2.1.1] at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)[hive-cli-2.1.1.jar :2.1.1] ,位于org.apache.hadoop的org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)[hive-cli-2.1.1.jar:2.1.1] 。 hive.cli.CliDriver.main(CliDriver.java:641)[hive-cli-2.1.1.jar:2.1.1] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)〜[?:1.8.0_131] 在sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)〜[:?1.8.0_131] 在sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)〜[:?1.8.0_131] at java.lang.reflect.Method.invoke(Method.java:498)〜[?:1.8.0_131] at org.apache.hadoop.util.RunJar.run(RunJar.java:234)[hadoop-common -2.8.0.jar :?] at org.apache.hadoop.util.RunJar.main(RunJar.java:148)[hadoop-common-2.8.0.jar :?] 引起:java.util。 concurrent.ExecutionException:org.apache.tez.dag.api.SessionNotRunning:TezSession已经关闭。 appattempt_1498057873641_0017_000002使用exitCode退出,因为AM容器导致应用程序application_1498057873641_0017失败了2次:-1000 未能完成此尝试.Diagnostics:java.io.FileNotFoundException:文件/ tmp/hadoop-hadoop/nm-local-dir/filecache不存在 有关更详细的输出,请检查应用程序跟踪页面:http://hadoop-master:8090/cluster/app/application_1498057873641_0017然后单击指向每次尝试日志的链接。 。申请失败。 (FutureTask.java:122)〜[?:1.8.0_131] at java.util.concurrent.FutureTask.get(FutureTask.java:206)〜[?:1.8。 0_131] at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.isOpen(TezSessionState.java:168)〜[hive-exec-2.1.1.jar:2.1.1] ... 17更多 引起:org.apache.tez.dag.api.SessionNotRunning:TezSession已经关闭。 appattempt_1498057873641_0017_000002由于AM容器失败而失败2次exitCode:-1000 失败此尝试。诊断:java.io.FileNotFoundException:文件/ tmp/hadoop-hadoop/nm-local-dir/filecache不存在 For更详细的输出,请检查应用程序跟踪页面:http://hadoop-master:8090/cluster/app/application_1498057873641_0017然后点击指向每次尝试日志的链接。 。申请失败。 at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:914)〜[tez-api-0.8.4.jar:0.8.4] at org.apache.tez.client.TezClient.waitTillReady( TezClient.java:883)〜[tez-api-0.8.4.jar:0.8.4] at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:416)〜[ hive-exec-2.1.1.jar:2.1.1] at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.access $ 000(TezSessionState.java:97)〜[hive-exec-2.1.1 .jar:2.1.1] at org.apache.hadoop.hive.ql.exec.tez.TezSessionState $ 1.call(TezSessionState.java:333)〜[hive-exec-2.1.1.jar:2.1.1] at org.apache.hadoop.hive.ql.exec.tez.TezSessionState $ 1.call(TezSessionState.java:329)〜[hive-exec-2.1.1.jar:2.1.1] at java.util.concurrent .FutureTask.run(FutureTask.java:266)〜[?:1.8.0_131] 在java.lang.Thread.run(Thread.java:748)〜[:1.8.0_131]

我所有群集节点上创建错过目录解决这个问题“的/ tmp/Hadoop的Hadoop的/纳米本地-DIR/filecache”。

2017-08-13T10:06:35567 INFO [主要] optimizer.ColumnPrunerProcFactory:RS 3 oldColExprMap:{当试图做SELECT COUNT(*) FROM Transactions;,如下面在Hive.log

然后我得到一个错误VALUE._col0 = Column [_col0]} 2017-08-13T10:06:35,568 INFO [main] optimizer.ColumnPrunerProcFactory:RS 3 newColExprMap:{VALUE._col0 = Column [_col0]} 2017-08-13T10:06: 35,604 INFO [213ea036-8245-4042-a5a1-ccd686ea2465 main] Configuration.deprecation:已弃用mapred.input.dir.recursive。相反,使用mapreduce.input.fileinputformat.input.dir.recursive 2017-08-13T10:06:35,658 INFO [main] annotation.StatsRulesProcFactory:STATS-GBY [2]:等于0的行数。行将为0设置为1 2017-08-13T10:06:35,679 INFO [main] optimizer.SetReducerParallelism:确定的缩减器数目:1 2017-08-13T10:06:35,680 INFO [main] parse.TezCompiler:自由周期: true 2017-08-13T10:06:35,689信息[213ea036-8245-4042-a5a1-ccd686ea2465 main] Configuration.deprecation:mapred.job.name已弃用。相反,使用mapreduce.job.name 2017-08-13T10:06:35,741 INFO [main] parse.CalcitePlanner:已完成计划生成 2017-08-13T10:06:35,742 INFO [main] ql.Driver:语义分析已完成 2017-08-13T10:06:35,742 INFO [main] ql.Driver:返回Hive架构:架构(fieldSchemas:[FieldSchema(name:c0,type:bigint,comment:null)],属性:null) 2017- 08-13T10:06:35,744 INFO [main] exec.ListSinkOperator:初始化运算符LIST_SINK [7] 2017-08-13T10:06:35,745 INFO [main] ql.Driver:完成编译命令(queryId = hadoop_20170813100633_31ca0425-6aca-434c -8039-48bc0e761095);所用时间:2。131秒 2017-08-13T10:06:35,768 INFO [main] ql.Driver:执行命令(queryId = hadoop_20170813100633_31ca0425-6aca-434c-8039-48bc0e761095):从交易中选择计数(*) 2017-08-13T10: 06:35,768 INFO [main] ql.Driver:Query ID = hadoop_20170813100633_31ca0425-6aca-434c-8039-48bc0e761095 2017-08-13T10:06:35,768 INFO [main] ql.Driver:Total jobs = 1 2017-08- 13T10:06:35,784 INFO [main] ql.Driver:启动Job 1 out of 1 2017-08-13T10:06:35,784 INFO [main] ql.Driver:以串行模式启动任务[Stage-1:MAPRED] 2017-08-13T10:06:35,789信息[main] tez.TezSessionPoolManager:当前用户:hadoop,会话用户:hadoop 2017-08-13T10:06:35,789 INFO [main] tez.TezSessionPoolManager:当前队列名称为null传入队列名称为空 2017-08-13T10:06:35,838 INFO [213ea036-8245-4042-a5a1-ccd686ea2465 main] Configuration.deprecation:mapred.committer.job.setup.cleanup.needed已弃用。相反,使用mapreduce.job.committer.setup.cleanup.needed 2017-08-13T10:06:35,840 INFO [main] ql.Context:新的临时目录是hdfs:// hadoop-master:8020/tmp/hive/hadoop/213ea036-8245-4042-a5a1-ccd686ea2465/hive_2017-08-13_10-06-33_614_5648783469307420794-1 2017-08-13T10:06:35,845 INFO [main] exec.Task:Session已经打开 2017-08- 13T10:06:35,847 INFO [main] tez.DagUtils:本地化资源,因为它不存在:file:/opt/apache-tez-0.8.4-bin to dest:hdfs:// hadoop-master:8020/tmp/hive/hadoop/_ez_session_dir/213ea036-8245-4042-a5a1-ccd686ea2465/apache-tez-0.8.4-bin 2017-08-13T10:06:35,850 INFO [main] tez.DagUtils:看起来像另一个线程或进程写同一个文件 2017-08-13T10:06:35,851 INFO [main] tez.DagUtils:等待文件hdfs:// hadoop-master:8020/tmp/hive/hadoop/_tez_session_dir/213ea036-8245-4042 -a5a1-ccd686ea2465/apache-tez-0.8.4-bin(5次尝试,间隔5000ms) 2017-08-13T10:07:00,860错误[main] tez.DagUtils:找不到正在上传的jar 2017-08-13T10:07:00,861错误[main] exec.Task:无法执行tez图形。 java.io.IOException:以前的作者可能未能写出hdfs:// hadoop-master:8020/tmp/hive/hadoop/_tez_session_dir/213ea036-8245-4042-a5a1-ccd686ea2465/apache-tez-0.8.4-bin 。失败,因为我不太可能写。 at org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeResource(DagUtils.java:1022) at org.apache.hadoop.hive.ql.exec.tez.DagUtils.addTempResources(DagUtils.java: 902) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.refreshLocalResourcesFromConf(TezSessionState。org.apache.hadoop.hive.ql.exec.tez.DagUtils.localizeTempFilesFromConf(DagUtils.java:845) java:466) at org.apache.hadoop.hive.ql.exec.tez.TezTask.updateSession(TezTask.java:294) at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute( (TaskRunner)。 java:100) at org.apache.hadoop.hive.ql.Driver.la unchTask(Driver.java:2073) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java: 1453) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161) at org。 apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) at org.apache.hadoop.hive。在org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) 处, CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.ma in(CliDriver.java:641) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl。java:62) at org.apache.hadoop.util.RunJar java.lang.reflect.Method.invoke(Method.java:498) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 。运行(RunJar.java:234) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) 2017-08-13T10:07:00,880错误[main] ql.Driver:FAILED:执行错误从org.apache.hadoop.hive.ql.exec.tez.TezTask

我经过这个JIRA问题的蜂巢问题“https://issues.apache.org/jira/browse/AMBARI-9821”,而是试图做计数时,仍然面临着这样的错误返回码1(*)从这张桌子。

TEZ的conf文件:

<configuration> 
    <property> 
     <name>tez.lib.uris</name> 
     <value>hdfs://hadoop-master:8020/user/tez/apache-tez-0.8.4-bin/share/tez.tar.gz</value> 
     <type>string</type> 
    </property> 
</configuration> 

蜂巢的conf文件:

<configuration> 
    <property> 
       <name>hive.server2.thrift.http.port</name> 
       <value>10001</value> 
     </property> 
     <property> 
       <name>hive.server2.thrift.http.min.worker.threads</name> 
       <value>5</value> 
     </property> 
     <property> 
       <name>hive.server2.thrift.http.max.worker.threads</name> 
       <value>500</value> 
     </property> 
     <property> 
       <name>hive.server2.thrift.http.path</name> 
       <value>cliservice</value> 
     </property> 
    <property> 
     <name>hive.server2.thrift.min.worker.threads</name> 
     <value>5</value> 
    </property> 
     <property> 
       <name>hive.server2.thrift.max.worker.threads</name> 
       <value>500</value> 
     </property> 
    <property> 
     <name>hive.server2.transport.mode</name> 
     <value>http</value> 
     <description>Server transport mode. "binary" or "http".</description> 
    </property> 
    <property> 
     <name>hive.server2.allow.user.substitution</name> 
     <value>true</value> 
    </property> 
    <property> 
     <name>hive.server2.authentication</name> 
     <value>NONE</value> 
    </property> 
    <property> 
     <name>hive.server2.thrift.bind.host</name> 
     <value>10.100.38.136</value> 
    </property> 
    <property> 
     <name>hive.support.concurrency</name> 
     <description>Enable Hive's Table Lock Manager Service</description> 
     <value>true</value> 
    </property> 
    <property> 
     <name>hive.zookeeper.quorum</name> 
     <description>Zookeeper quorum used by Hive's Table Lock Manager</description> 
     <value>hadoop-master,hadoop-slave1,hadoop-slave2,hadoop-slave3,hadoop-slave4,hadoop-slave5</value> 
    </property> 
    <property> 
     <name>hive.zookeeper.client.port</name> 
     <value>2181</value> 
     <description>The port at which the clients will connect.</description> 
    </property> 
    <property> 
     <name>javax.jdo.option.ConnectionURL</name> 
     <value>jdbc:derby://hadoop-master:1527/metastore_db2</value> 
     <description> 
      JDBC connect string for a JDBC metastore. 
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. 
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. 
     </description> 
    </property> 
    <property> 
     <name>hive.metastore.warehouse.dir</name> 
     <value>/user/hive/warehouse</value> 
     <description>location of default database for the warehouse</description> 
    </property> 
    <property> 
       <name>hive.server2.webui.host</name> 
       <value>10.100.38.136</value> 
     </property> 
     <property> 
       <name>hive.server2.webui.port</name> 
       <value>10010</value> 
     </property> 
    <!--<property> 
     <name>hive.metastore.local</name> 
     <value>true</value> 
    </property> 
    <property> 
     <name>hive.metastore.uris</name> 
     <value/> 
     <value>thrift://hadoop-master:9083</value> 
     <value>file:///source/apache-hive-2.1.1-bin/bin/metastore_db/</value> 
     <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> 
    </property>--> 
    <property> 
     <name>javax.jdo.option.ConnectionDriverName</name> 
     <value>org.apache.derby.jdbc.ClientDriver</value> 
     <description>Driver class name for a JDBC metastore</description> 
    </property> 
    <property> 
     <name>javax.jdo.PersistenceManagerFactoryClass</name> 
     <value>org.datanucleus.api.jdo.JDOPersistenceManagerFactory</value> 
     <description>class implementing the jdo persistence</description> 
    </property> 
    <property> 
     <name>datanucleus.autoStartMechanism</name> 
     <value>SchemaTable</value> 
    </property> 
    <property> 
     <name>hive.execution.engine</name> 
     <value>tez</value> 
    </property> 
    <property> 
     <name>javax.jdo.option.ConnectionUserName</name> 
     <value>APP</value> 
    </property> 
    <property> 
     <name>javax.jdo.option.ConnectionPassword</name> 
     <value>mine</value> 
    </property> 
    <!--<property> 
     <name>datanucleus.autoCreateSchema</name> 
      <value>false</value> 
      <description>Creates necessary schema on a startup if one doesn't exist</description> 
    </property> --> 
</configuration> 

而且这是由纱线诊断:

应用application_1498057873641_0018失败,原因是2倍到AM的容器appattempt_1498057873641_0018_000002退出exitCode:-103 失败此尝试。诊断s:Container [pid = 31779,containerID = container_1498057873641_0018_02_000001]超出了虚拟内存限制。当前使用情况:使用1 GB物理内存169.3 MB;使用2.6 GB的2.1 GB虚拟内存。杀死容器。 转储过程树为container_1498057873641_0018_02_000001的: | - PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)SYSTEM_TIME(MILLIS)VMEM_USAGE(字节)RSSMEM_USAGE(页)FULL_CMD_LINE | - 31786 31779 31779 31779(JAVA)587 61 2710179840 43031/opt/jdk-8u131/jdk1.8.0_131/bin/java -Xmx819m -Djava.io.tmpdir =/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1498057873641_0018/container_1498057873641_0018_02_000001/tmp -server - Djava.net.preferIPv4Stack = true -Dhadoop.metrics.log.level = WARN -XX:+ PrintGCDetails -verbose:gc -XX:+ PrintGCTimeStamps -XX:+ UseNUMA -XX:+ UseParallelGC -Dlog4j.configuratorClass = org.apache。 tez.common.TezLog4jConfigurator -Dlog4j.configuration = tez-container-log4j.properties -Dyarn.app.container.log.dir =/opt/hadoop/hadoop-2.8.0/logs/userlogs/application_1498057873641_0018/container_1498057873641_0018_02_000001 -Dtez.root .logger = INFO,CLA -Dsun .nio.ch.bugLevel = org.apache.tez.dag.app.DAGAppMaster --session | - 31779 31777 31779 31779(bash)0 0 115838976 306/bin/bash -c/opt/jdk-8u131/jdk1。 8.0_131/bin/java -Xmx819m -Djava.io.tmpdir =/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1498057873641_0018/container_1498057873641_0018_02_000001/tmp -server -Djava.net.preferIPv4Stack = true - Dhadoop.metrics.log.level = WARN -XX:+ PrintGCDetails -verbose:gc -XX:+ PrintGCTimeStamps -XX:+ UseNUMA -XX:+ UseParallelGC -Dlog4j.configuratorClass = org.apache.tez.common.TezLog4jConfigurator -Dlog4j。 configuration = tez-container-log4j.properties -Dyarn.app.container.log.dir =/opt/hadoop/hadoop-2.8.0/logs/userlogs/application_1498057873641_0018/container_1498057873641_0018_02_000001 -Dtez.root.logger = INFO,CLA -Dsun .nio.ch.bugLevel =''org.apache.tez.dag.app.DAGAppMaster --session 1> /opt/hadoop/hadoop-2.8.0/logs/userlogs/application_1498057873641_0018/container_1498057873641_0018_02_000001/stdout 2>/opt /哈doop/hadoop-2.8.0/logs/userlogs/application_1498057873641_0018/container_1498057873641_0018_02_000001/stderr 根据请求死亡的容器。退出代码为143 使用非零退出代码退出的容器143 有关更详细的输出,请检查应用程序跟踪页面:http://hadoop-master:8090/cluster/app/application_1498057873641_0018然后单击指向每次尝试日志的链接。 。申请失败。

回答

1

最有可能您碰到https://issues.apache.org/jira/browse/HIVE-16398。 作为解决方法,您必须在/ usr/hdp // hive/conf/hive-env中添加以下内容。sh

# Folder containing extra libraries required for hive compilation/execution can be controlled by: 
if [ "${HIVE_AUX_JARS_PATH}" != "" ]; then 
if [ -f "${HIVE_AUX_JARS_PATH}" ]; then 
export HIVE_AUX_JARS_PATH=${HIVE_AUX_JARS_PATH} 
elif [ -d "/usr/hdp/current/hive-webhcat/share/hcatalog" ]; then 
export HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar 
fi 
elif [ -d "/usr/hdp/current/hive-webhcat/share/hcatalog" ]; then 
export HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar 
fi