我有用Python编写的mapreduce作业。该程序在linux环境下测试成功,但在Hadoop下运行时失败。Hadoop Streaming作业在python中失败
这里是作业命令:
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-0.20.1+169.127-streaming.jar \
-input /data/omni/20110115/exp6-10122 -output /home/yan/visitorpy.out \
-mapper SessionMap.py -reducer SessionRed.py -file SessionMap.py \
-file SessionRed.py
会话的模式*的.py是755,和#!/usr/bin/env python
是在* .py文件的第一行。 Mapper.py是:从日志
#!/usr/bin/env python
import sys
for line in sys.stdin:
val=line.split("\t")
(visidH,visidL,sessionID)=(val[4],val[5],val[108])
print "%s%s\t%s" % (visidH,visidL,sessionID)
错误:
java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:110)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:126)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
如果你认为这回答了这个问题,你应该选择它作为答案。这将使其他可能面临类似问题的人更容易。 – Kasisnu 2014-01-22 16:47:16