Hadoop Streaming简单作业失败错误python

我是hadoop和mapreduce的新手，我正在尝试编写一个mapreduce来计算一个字数txt文件的前10位数字。Hadoop Streaming简单作业失败错误python

我的txt文件 'q2_result.txt' 的样子：

yourself  268 
yourselves  73 
yoursnot  1 
youst 1 
youth 270 
youthat 1 
youthful  31 
youths 9 
youtli 1 
youwell 1 
youwondrous  1 
youyou 1 
zanies 1 
zany 1 
zeal 32 
zealous 6 
zeals 1

映射：

#!/usr/bin/env python 

import sys 

for line in sys.stdin: 
    line = line.strip() 
    word, count = line.split() 
    print "%s\t%s" % (word, count)

减速机：

#!usr/bin/env/ python 

import sys 

top_n = 0 
for line in sys.stdin: 
    line = line.strip() 
    word, count = line.split() 

    top_n += 1 
    if top_n == 11: 
     break 
    print '%s\t%s' % (word, count)

我知道你可以通过一个标志-D选项在Hadoop的jar命令，所以它对你想要的键进行排序（在我的情况下，计数是k2,2），在这里我只是使用一个简单的命令冷杉T：

hadoop jar /usr/hdp/2.5.0.0-1245/hadoop-mapreduce/hadoop-streaming-2.7.3.2.5.0.0-1245.jar -file /root/LAB3/mapper.py -mapper mapper.py -file /root/LAB3/reducer.py -reducer reducer.py -input /user/root/lab3/q2_result.txt -output /user/root/lab3/test_out

因此，我认为这种简单的映射，与减速机不应该给我的错误，但它确实和我想不通为什么，这里的错误：http://pastebin.com/PvY4d89c

（我使用的Horton在Ubuntu16.04上的VirtualBox上运行HDP Sandbox）

来源

2016-09-30 Sam

请检查了这一点http://stackoverflow.com/questions/4339788/hadoop-streaming-无法找到文件错误 – Rahmath

我知道，“文件未找到错误”表示与“文件无法执行”完全不同的东西，在这种情况下，问题在于文件无法执行。

在Reducer.py：

错误：

#!usr/bin/env/ python

正确：

#!/usr/bin/env python

来源

2016-09-30 21:37:46 ozw1z5rd

我不能相信我错过了...，你能解释为什么这种差异会导致hadoop流媒体错误？我有点理解包括＃！告诉hadoop你正在执行python文件。 – Sam

env是位于/ usr/bin中的程序。编写'usr/bin/env /'实际上你正在运行一个目录。这个程序允许你使用python而不使用绝对路径。使用＃！你正在告诉哪个程序执行脚本，它必须存在并且可以运行。 – ozw1z5rd

Hadoop Streaming简单作业失败错误python

回答

相关问题