2010-08-04 56 views
4

我试图结合Hadoop,Pig和Cassandra来通过简单的Pig查询来处理Cassandra中存储的数据。问题是我无法让Pig创建实际与CassandraStorage配合使用的Map/Reduce作业。通过Pig提交地图/缩小作业时捆绑罐子?

我所做的是从contrib/pig(Cassandra的源代码发行版)顶部的一个集群机器上复制了storage-conf.xml文件,然后将其编译到cassandra_loadfun.jar文件中。

接下来,我适应的例子,script.pig包括所有的罐子:

register /opt/pig/pig-0.7.0-core.jar; 
register /tmp/apache-cassandra-0.6.3-src/lib/libthrift-r917130.jar; 
REGISTER /tmp/apache-cassandra-0.6.3-src/contrib/pig/build/cassandra_loadfunc.jar; 
rows = LOAD 'cassandra://Keyspace1/Standard1' USING org.apache.cassandra.hadoop.pig.CassandraStorage(); 
cols = FOREACH rows GENERATE flatten($1); 
colnames = FOREACH cols GENERATE $0; 
namegroups = GROUP colnames BY $0; 
namecounts = FOREACH namegroups GENERATE COUNT($1), group; 
orderednames = ORDER namecounts BY $0; 
topnames = LIMIT orderednames 50; 
dump topnames; 

所以,如果我没有记错的罐子应该捆绑到提交到Hadoop的工作。 但在运行作业时,它只是抛出了我的异常:

2010-08-04 22:11:46,395 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2117: Unexpected error when launching map reduce job. 
2010-08-04 22:11:46,395 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias topnames 
    at org.apache.pig.PigServer.openIterator(PigServer.java:521) 
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544) 
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) 
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) 
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) 
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) 
    at org.apache.pig.Main.main(Main.java:391) 
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias topnames 
    at org.apache.pig.PigServer.store(PigServer.java:577) 
    at org.apache.pig.PigServer.openIterator(PigServer.java:504) 
    ... 6 more 
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2117: Unexpected error when launching map reduce job. 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209) 
    at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308) 
    at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835) 
    at org.apache.pig.PigServer.store(PigServer.java:569) 
    ... 7 more 
Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.NoClassDefFoundError: org/apache/thrift/TBase 
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510) 
    at java.lang.Thread.dispatchUncaughtException(Thread.java:1845) 

我不理解,因为节俭库明确列出,并应捆绑在一起,不是吗?

+0

对于在寻找[错误1066:无法打开别名的迭代器]时发现此帖子的人(http://stackoverflow.com/questions/34495085/error-1066-unable-to-open-iterator-for- alias-in-pig-generic-solution)这里是一个[通用解决方案](http://stackoverflow.com/a/34495086/983722)。 – 2015-12-28 14:43:07

回答

2

异常清楚地说,这是无法找到TBASE类

java.lang.NoClassDefFoundError:组织/阿帕奇/节俭/ TBASE

爆炸捆绑的罐子,并检查节俭的lib罐子实际存在在正确的位置。节俭罐可能已经捆绑在不同的地点。

您还可以尝试将jar放入捆绑jar的lib文件夹中。另一个选项是显式添加jar到类路径。

+0

这些类都存在于生成的jar文件中,所以这不是真正的问题。 – cdecker 2010-08-17 12:28:20

+0

要么有多个具有相同类org/apache/thrift/TBase的jar导致冲突,要么jar没有正确注册。这是我能根据例外情况考虑的唯一原因 – 2010-08-17 15:52:32