0
我想用不同数量的映射器和减速器多次运行hadoop工作。我已经设置了配置:设置输入分割不工作的映射器的Hadoop数
- mapreduce.input.fileinputformat.split.maxsize
- mapreduce.input.fileinputformat.split.minsize
- mapreduce.job.maps
我的文件大小是1160421275,当我尝试在此代码中配置4个映射器和3个reducer时:
Configuration conf = new Configuration();
FileSystem hdfs = FileSystem.get(conf);
long size = hdfs.getContentSummary(new Path("input/filea").getLength();
size+=hdfs.getContentSummary(new Path("input/fileb").getLength();
conf.set("mapreduce.input.fileinputformat.split.maxsize", String.valueOf((size/4)));
conf.set("mapreduce.input.fileinputformat.split.minsize", String.valueOf((size/4)));
conf.set("mapreduce.job.maps",4);
....
job.setNumReduceTask(3);
尺寸/ 4给出290105318.作业的执行,给出以下输出:
2016-11-19 12:30:36,426 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(287)) - Total input paths to process : 1
2016-11-19 12:30:36,535 INFO [main] input.FileInputFormat (FileInputFormat.java:listStatus(287)) - Total input paths to process : 4
2016-11-19 12:30:36,572 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:7
分割的数目是7,而不是4,成功作业的输出是:
File System Counters
FILE: Number of bytes read=18855390277
FILE: Number of bytes written=14653469965
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=39184416
Map output records=36751473
Map output bytes=787022241
Map output materialized bytes=860525313
Input split bytes=1801
Combine input records=0
Combine output records=0
Reduce input groups=25064998
Reduce shuffle bytes=860525313
Reduce input records=36751473
Reduce output records=1953960
Spilled Records=110254419
Shuffled Maps =21
Failed Shuffles=0
Merged Map outputs=21
GC time elapsed (ms)=1124
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=6126829568
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=77643084
该地图显示它处理了21个混洗地图。我希望它只处理4个映射器。对于减速器,它提供总数为3的正确数量的文件。我的映射器分割大小设置是否错误?
AFAIK那些confs的罚款。输入位置有多少个文件? – mrsrinivas
对于文件A有1个文件,对于文件B有4个文件。 – mkvem
当我用9,它出来与10个分裂 – mkvem