2015-08-15 103 views
2

我已经下载了(因为我没有空间运行CDH或沙箱)的Hadoop 2.6.0和Hadoop从here的Hadoop MapReduce的数据流不运行

流,我跑的

bin/hadoop jar contrib/hadoop-streaming-2.6.0.jar \ 
-file ${HADOOP_HOME}/py_mapred/mapper.py -mapper ${HADOOP_HOME}/py_mapred/mapper.py \ 
-file ${HADOOP_HOME}/py_mapred/reducer.py -reducer ${HADOOP_HOME}/py_mapred/reducer.py \ 
-input /input/davinci/* -output /input/davinci-output 
命令

其中我将下载的流式jar存储在$ {HADOOP_HOME}/contrib中,并将其他文件存储在py_mapred中。同时,我copyFromLocal到hdfs上的/ input目录。现在,当我运行命令时,出现以下几行:

15/08/14 17:35:45 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead. 
15/08/14 17:35:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
packageJobJar: [/usr/local/cellar/hadoop/2.6.0/py_mapred/mapper.py, /usr/local/cellar/hadoop/2.6.0/py_mapred/reducer.py, /var/folders/c5/4xfj65v15g91f71c_b9whnpr0000gn/T/hadoop-unjar3313567263260134566/] [] /var/folders/c5/4xfj65v15g91f71c_b9whnpr0000gn/T/streamjob9165494241574343777.jar tmpDir=null 
15/08/14 17:35:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
15/08/14 17:35:47 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
15/08/14 17:35:48 INFO mapred.FileInputFormat: Total input paths to process : 1 
15/08/14 17:35:48 INFO mapreduce.JobSubmitter: number of splits:2 
15/08/14 17:35:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1439538212023_0002 
15/08/14 17:35:49 INFO impl.YarnClientImpl: Submitted application application_1439538212023_0002 
15/08/14 17:35:49 INFO mapreduce.Job: The url to track the job: http://Jonathans-MacBook-Pro.local:8088/proxy/application_1439538212023_0002/ 
15/08/14 17:35:49 INFO mapreduce.Job: Running job: job_1439538212023_0002 

看起来命令已被接受。我检查本地主机:8088和该作业确实注册。然而它没有运行,尽管它说Running job: job_1439538212023_0002。我的命令有问题吗?是否由于权限设置?为什么这项工作没有运行?

谢谢

+0

我有一个类似的问题发生。 – Will

回答

1

这是正确的方式对数据流:

bin/hadoop jar contrib/hadoop-streaming-2.6.0.jar \ 
-file ${HADOOP_HOME}/py_mapred/mapper.py -mapper '/usr/bin/python mapper.py' -file ${HADOOP_HOME}/py_mapred/reducer.py -reducer '/usr/bin/python reducer.py' -input /input/davinci/* -output /input/davinci-output