2013-04-09 55 views
5

我试图让使用蒙戈 - Hadoop的地图,减少与Python功能。 Hadoop正在工作,hadoop streaming正在与python和mongo-hadoop适配器一起工作。但是,使用python的mongo-hadoop流式示例无法正常工作。当试图在流/例子/国库运行的例子中,我得到以下错误:Hadoop的数据流将使用Python蒙戈 - Hadoop的

[email protected]: ~/git/mongo-hadoop/streaming$ hadoop jar target/mongo-hadoop-streaming-assembly-1.0.1.jar -mapper examples/treasury/mapper.py -reducer examples/treasury/reducer.py -inputformat com.mongodb.hadoop.mapred.MongoInputFormat -outputformat com.mongodb.hadoop.mapred.MongoOutputFormat -inputURI mongodb://127.0.0.1/mongo_hadoop.yield_historical.in -outputURI mongodb://127.0.0.1/mongo_hadoop.yield_historical.streaming.out

13/04/09 11:54:34 INFO streaming.MongoStreamJob: Running 

13/04/09 11:54:34 INFO streaming.MongoStreamJob: Init 

13/04/09 11:54:34 INFO streaming.MongoStreamJob: Process Args 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Setup Options' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: PreProcess Args 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Parse Options 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-mapper' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'examples/treasury/mapper.py' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-reducer' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'examples/treasury/reducer.py' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-inputformat' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'com.mongodb.hadoop.mapred.MongoInputFormat' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-outputformat' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'com.mongodb.hadoop.mapred.MongoOutputFormat' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-inputURI' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'mongodb://127.0.0.1/mongo_hadoop.yield_historical.in' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: '-outputURI' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Arg: 'mongodb://127.0.0.1/mongo_hadoop.yield_historical.streaming.out' 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Add InputSpecs 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Setup output_ 

13/04/09 11:54:34 INFO streaming.StreamJobPatch: Post Process Args 

13/04/09 11:54:34 INFO streaming.MongoStreamJob: Args processed. 

13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson 

13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson 

13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson 

13/04/09 11:54:36 INFO io.MongoIdentifierResolver: Resolving: bson 

**Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/filecache/DistributedCache** 
    at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:959) 
    at com.mongodb.hadoop.streaming.MongoStreamJob.run(MongoStreamJob.java:36) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) 
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) 
    at com.mongodb.hadoop.streaming.MongoStreamJob.main(MongoStreamJob.java:63) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
    at java.lang.reflect.Method.invoke(Method.java:597) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208) 
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.filecache.DistributedCache 
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306) 
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247) 
    ... 10 more 

如果有人可以提供一些线索这将是一个很大的帮助。

全部信息:

至于我可以告诉我需要得到以下四两件事的工作:

  1. 安装和测试的Hadoop
  2. 安装和使用Python
  3. 测试Hadoop的流
  4. 安装和测试蒙戈-的hadoop
  5. 安装并用蟒测试蒙戈-hadoop的流

因此,它的缺点是我有一切工作到第四步。使用(https://github.com/danielpoe/cloudera)我已经得到了Cloudera 4安装

  1. 使用厨师食谱4 Cloudera的已安装并正常运行和测试
  2. 使用迈克尔nolls博客教程,测试Hadoop的使用python流成功
  3. 使用Google文档在mongodb.org能够同时运行国库UFO和实例(建立build.sbt CDH4)
  4. 下载1.5小时值得使用的自述在流/例子叽叽喳喳例如Twitter的数据,并且也尝试了国库例子。
+0

解决: 得到它的工作,我们需要安装Cloudera的4 然后使用版本CDH4 然后使用版本CDH3 此时创建蒙戈 - Hadoop的流媒体驱动器,安装蒙戈-的Hadoop适配器,而不是跟随指令和从仓库中安装pymongo-Hadoop的,最好的解决办法 '须藤PIP安装pymongo_hadoop' – Conor 2013-04-17 08:50:05

回答

0

你有安装了最新的pymongo_hadoop连接器?你正在运行的其他软件的版本是什么?

+0

嗨罗斯,感谢您的回答。这是pymongo_hadoop连接器是问题所在。在给我们提示之前,我们已经解决了一点。 – Conor 2013-04-17 08:47:13