2016-11-16 80 views
0

我在本地运行地图缩小。Map Reduce:为什么需要指定“python”之前管道到.py文件?

我的命令行命令如下:

cat testfile | python ./mapper.py | python ./reducer.py 

,这工作得很好。然而,当我的命令如下:

cat testfile | ./mapper.py | ./reducer.py 

我收到以下错误:

./mapper.py: line 1: import: command not found 
./mapper.py: line 3: syntax error near unexpected token `(' 
./mapper.py: line 3: `def mapper(): 

这是有道理的,因为在命令行正在读我的Python文件作为bash和由Python的语法感到困惑。

但我看到的所有在线示例(例如http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/)都不包括.py文件之前的python。如何在不指定之前配置我的机器以运行管道mapper.pyreducer.py

万一有帮助,这是我的映射器代码:

import sys 

def mapper(): 
    for line in sys.stdin: 
     data = line.strip().split('\t') 
     if len(data) == 6: 
      category = data[3] 
      sales = data[4] 
      print '{0}\t{1}'.format(category, sales) 

if __name__ == "__main__": 
    mapper() 

这里是我的减速器代码:

import sys 

def reducer(): 
    current_total = 0 
    old_key = None 

    for line in sys.stdin: 
     data = line.strip().split('\t') 
     if len(data) == 2: 
      current_key, sales = data 
      sales = float(sales) 

      if old_key and current_key != old_key: 
       print "{0}\t{1}".format(old_key, current_total) 
       current_total = 0 
      old_key = current_key 
      current_total += sales 

    print "{0}\t{1}".format(current_key, current_total) 

if __name__ == "__main__": 
    reducer() 

我的数据是这样的:

2012-01-01  09:01 Anchorage  DVDs 6.38 Amex 
2012-01-01  09:01 Aurora Electronics 117.81 MasterCard 
2012-01-01  09:01 Philadelphia DVDs 351.31 Cash 
+0

你的Python脚本的开头'#添加hashbang行的/ usr/bin中/ env的python' –

+0

附加家当并设置执行ATTRIB'使用chmod + X script.py' – furas

回答

3

因为你文件不知道它的iterpreter。您正在使用python ./myfile明确指定它。如果你不想明确地定义它。您可以在文件的第一行提到shebang,这基本上是解释器的路径。对于Python,认领是这样的:

#!/usr/bin/env python 

#!/usr/local/bin/python 

有关详细信息,读:

作为每shebang wiki

Under Unix-like operating systems, when a script with a shebang is run as a program, the program loader parses the rest of the script's initial line as an interpreter directive; the specified interpreter program is run instead, passing to it as an argument the path that was initially used when attempting to run the script

+0

真棒!工作完美,谢谢。 – bigmacboy78

相关问题