我正在使用hadoop中mapreduce的矩阵乘法示例。我想问一下,如果溢出的记录总是等于mapinput和mapoutput记录。我有 洒从mapinput和mapoutput记录不同的记录应该溢出的记录总是等于MapReduce中的mapinput记录或mapoutput记录使用hadoop?
这里是一个测试的输出我得到:
Three by three test
IB = 1
KB = 2
JB = 1
11/12/14 13:16:22 INFO input.FileInputFormat: Total input paths to process : 2
11/12/14 13:16:22 INFO mapred.JobClient: Running job: job_201112141153_0003
11/12/14 13:16:23 INFO mapred.JobClient: map 0% reduce 0%
11/12/14 13:16:32 INFO mapred.JobClient: map 100% reduce 0%
11/12/14 13:16:44 INFO mapred.JobClient: map 100% reduce 100%
11/12/14 13:16:46 INFO mapred.JobClient: Job complete: job_201112141153_0003
11/12/14 13:16:46 INFO mapred.JobClient: Counters: 17
11/12/14 13:16:46 INFO mapred.JobClient: Job Counters
11/12/14 13:16:46 INFO mapred.JobClient: Launched reduce tasks=1
11/12/14 13:16:46 INFO mapred.JobClient: Launched map tasks=2
11/12/14 13:16:46 INFO mapred.JobClient: Data-local map tasks=2
11/12/14 13:16:46 INFO mapred.JobClient: FileSystemCounters
11/12/14 13:16:46 INFO mapred.JobClient: FILE_BYTES_READ=1464
11/12/14 13:16:46 INFO mapred.JobClient: HDFS_BYTES_READ=528
11/12/14 13:16:46 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2998
11/12/14 13:16:46 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=384
11/12/14 13:16:46 INFO mapred.JobClient: Map-Reduce Framework
11/12/14 13:16:46 INFO mapred.JobClient: Reduce input groups=36
11/12/14 13:16:46 INFO mapred.JobClient: Combine output records=0
11/12/14 13:16:46 INFO mapred.JobClient: Map input records=18
11/12/14 13:16:46 INFO mapred.JobClient: Reduce shuffle bytes=735
11/12/14 13:16:46 INFO mapred.JobClient: Reduce output records=15
11/12/14 13:16:46 INFO mapred.JobClient: Spilled Records=108
11/12/14 13:16:46 INFO mapred.JobClient: Map output bytes=1350
11/12/14 13:16:46 INFO mapred.JobClient: Combine input records=0
11/12/14 13:16:46 INFO mapred.JobClient: Map output records=54
11/12/14 13:16:46 INFO mapred.JobClient: Reduce input records=54
11/12/14 13:16:46 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
11/12/14 13:16:46 INFO input.FileInputFormat: Total input paths to process : 1
11/12/14 13:16:46 INFO mapred.JobClient: Running job: job_local_0001
11/12/14 13:16:46 INFO input.FileInputFormat: Total input paths to process : 1
11/12/14 13:16:46 INFO mapred.MapTask: io.sort.mb = 100
11/12/14 13:16:46 INFO mapred.MapTask: data buffer = 79691776/99614720
11/12/14 13:16:46 INFO mapred.MapTask: record buffer = 262144/327680
11/12/14 13:16:46 INFO mapred.MapTask: Starting flush of map output
11/12/14 13:16:46 INFO mapred.MapTask: Finished spill 0
11/12/14 13:16:46 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.Merger: Merging 1 sorted segments
11/12/14 13:16:46 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 128 bytes
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
11/12/14 13:16:46 INFO mapred.LocalJobRunner:
11/12/14 13:16:46 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
11/12/14 13:16:46 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:9000/tmp/MatrixMultiply/out
11/12/14 13:16:46 INFO mapred.LocalJobRunner: reduce > reduce
11/12/14 13:16:46 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
11/12/14 13:16:47 INFO mapred.JobClient: map 100% reduce 100%
11/12/14 13:16:47 INFO mapred.JobClient: Job complete: job_local_0001
11/12/14 13:16:47 INFO mapred.JobClient: Counters: 14
11/12/14 13:16:47 INFO mapred.JobClient: FileSystemCounters
11/12/14 13:16:47 INFO mapred.JobClient: FILE_BYTES_READ=89412
11/12/14 13:16:47 INFO mapred.JobClient: HDFS_BYTES_READ=37206
11/12/14 13:16:47 INFO mapred.JobClient: FILE_BYTES_WRITTEN=37390
11/12/14 13:16:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=164756
11/12/14 13:16:47 INFO mapred.JobClient: Map-Reduce Framework
11/12/14 13:16:47 INFO mapred.JobClient: Reduce input groups=9
11/12/14 13:16:47 INFO mapred.JobClient: Combine output records=9
11/12/14 13:16:47 INFO mapred.JobClient: Map input records=15
11/12/14 13:16:47 INFO mapred.JobClient: Reduce shuffle bytes=0
11/12/14 13:16:47 INFO mapred.JobClient: Reduce output records=9
11/12/14 13:16:47 INFO mapred.JobClient: Spilled Records=18
11/12/14 13:16:47 INFO mapred.JobClient: Map output bytes=180
11/12/14 13:16:47 INFO mapred.JobClient: Combine input records=15
11/12/14 13:16:47 INFO mapred.JobClient: Map output records=15
11/12/14 13:16:47 INFO mapred.JobClient: Reduce input records=9
...........X[0][0]=30, Y[0][0]=9
Bad Answer
...........X[0][1]=36, Y[0][1]=36
...........X[0][2]=42, Y[0][2]=42
...........X[1][0]=66, Y[1][0]=24
Bad Answer
...........X[1][1]=81, Y[1][1]=81
...........X[1][2]=96, Y[1][2]=96
...........X[2][0]=102, Y[2][0]=39
Bad Answer
...........X[2][1]=126, Y[2][1]=126
...........X[2][2]=150, Y[2][2]=150
这个例子与代码一起说明如下:
http://www.norstad.org/matrix-multiply/index.html
请问您能否告诉我该问题在哪里,如何才能正确使用?由于
WL
我也想提及,虽然在独立模式下运行,但它在溢出记录等于地图输入和输出记录(这是18)时工作正常,但在伪分布模式下它不起作用,溢出记录不等于mapinput和mapoutput记录。 – waqas 2011-12-14 12:48:14
溢出的意思是,它们必须溢出到磁盘,因为RAM在分类/洗牌阶段不够用。所以这应该是最好的或非常低的零。 – 2011-12-14 12:58:40