我使用Hadoop的版本是亨利马乌在行动:06章:维基百科作业失败java.lang.ArrayIndexOutOfBoundsException
$ hadoop version
Hadoop 2.5.0-cdh5.2.0
Subversion http://github.com/cloudera/hadoop -r e1f20a08bde76a33b79df026d00a0c91b2298387
Compiled by jenkins on 2014-10-11T21:00Z
Compiled with protoc 2.5.0
From source with checksum 309bccd135b199bdfdd6df5f3f4153d
This command was run using /DCNFS/applications/cdh/5.2/app/hadoop-2.5.0-cdh5.2.0/share/hadoop/common/hadoop-common-2.5.0-cdh5.2.0.jar
我input.txt中看起来像
$ hadoop dfs -cat input/input.txt | head -5
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
1: 1664968
2: 3 747213 1664968 1691047 4095634 5535664
3: 9 77935 79583 84707 564578 594898 681805 681886 835470 880698 1109091 1125108 1279972 1463445 1497566 1783284 1997564 2006526 2070954 2250217 2268713 2276203 2374802 2571397 2640902 2647217 2732378 2821237 3088028 3092827 3211549 3283735 3491412 3492254 3498305 3505664 3547201 3603437 3617913 3793767 3907547 4021634 4025897 4086017 4183126 4184025 4189168 4192731 4395141 4899940 4987592 4999120 5017477 5149173 5149311 5158741 5223097 5302153 5474252 5535280
4: 145
5: 8 57544 58089 60048 65880 284186 313376 564578 717529 729993 1097284 1204280 1204407 1255317 1670218 1720928 1850305 2269887 2333350 2359764 2640693 2743982 3303009 3322952 3492254 3573013 3721693 3797343 3797349 3797359 3849461 4033556 4173124 4189215 4207986 4669945 4817900 4901416 5010479 5062062 5072938 5098953 5292042 5429924 5599862 5599863 5689049
和我的用户。 TXT看起来像
$ hadoop dfs -cat input/users.txt
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
3: 9 77935 79583 84707 564578 594898 681805 681886 835470 880698 1109091
1125108 1279972 1463445 1497566 1783284 1997564 2006526 2070954 2250217
2268713 2276203 2374802 2571397 2640902 2647217 2732378 2821237 3088028
3092827 3211549 3283735 3491412 3492254 3498305 3505664 3547201 3603437
3617913 3793767 3907547 4021634 4025897 4086017 4183126 4184025 4189168
4192731 4395141 4899940 4987592 4999120 5017477 5149173 5149311 5158741
5223097 5302153 5474252 5535280
我跑我的工作作为
$ hadoop jar mahout-core-0.9-cdh5.2.0-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData -s SIMILARITY_COOCCURRENCE
和失败与以下跟踪
15/02/07 16:48:44 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --maxPrefsInItemSimilarity=[500], --maxPrefsPerUser=[10], --maxSimilaritiesPerItem=[100], --minPrefsPerUser=[1], --numRecommendations=[10], --similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp], --usersFile=[input/users.txt]}
15/02/07 16:48:44 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[input/input.txt], --minPrefsPerUser=[1], --output=[temp/preparePreferenceMatrix], --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
15/02/07 16:48:44 INFO client.RMProxy: Connecting to ResourceManager at name1.hadoop.dc.engr.scu.edu/10.128.0.201:8032
15/02/07 16:48:45 INFO input.FileInputFormat: Total input paths to process : 1
15/02/07 16:48:45 INFO mapreduce.JobSubmitter: number of splits:8
15/02/07 16:48:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1422500076160_0023
15/02/07 16:48:46 INFO impl.YarnClientImpl: Submitted application application_1422500076160_0023
15/02/07 16:48:46 INFO mapreduce.Job: The url to track the job: http://name1.hadoop.dc.engr.scu.edu:8088/proxy/application_1422500076160_0023/
15/02/07 16:48:46 INFO mapreduce.Job: Running job: job_1422500076160_0023
15/02/07 16:48:56 INFO mapreduce.Job: Job job_1422500076160_0023 running in uber mode : false
15/02/07 16:48:56 INFO mapreduce.Job: map 0% reduce 0%
15/02/07 16:49:02 INFO mapreduce.Job: Task Id : attempt_1422500076160_0023_m_000006_0, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
15/02/07 16:49:02 INFO mapreduce.Job: Task Id : attempt_1422500076160_0023_m_000001_0, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
我相信,数据格式不正确,可有一个人请帮我解决这个问题?我是新来MapReduce
和Hadoop
非常感谢
堆栈跟踪提到数组,但没有代码片段很难说为什么会有错误。 – 2015-02-08 04:59:58