亨利马乌在行动：06章：维基百科作业失败java.lang.ArrayIndexOutOfBoundsException

我使用Hadoop的版本是亨利马乌在行动：06章：维基百科作业失败java.lang.ArrayIndexOutOfBoundsException

$ hadoop version 
Hadoop 2.5.0-cdh5.2.0 
Subversion http://github.com/cloudera/hadoop -r e1f20a08bde76a33b79df026d00a0c91b2298387 
Compiled by jenkins on 2014-10-11T21:00Z 
Compiled with protoc 2.5.0 
From source with checksum 309bccd135b199bdfdd6df5f3f4153d 
This command was run using /DCNFS/applications/cdh/5.2/app/hadoop-2.5.0-cdh5.2.0/share/hadoop/common/hadoop-common-2.5.0-cdh5.2.0.jar

我input.txt中看起来像

$ hadoop dfs -cat input/input.txt | head -5 
DEPRECATED: Use of this script to execute hdfs command is deprecated. 
Instead use the hdfs command for it. 

1: 1664968 
2: 3 747213 1664968 1691047 4095634 5535664 
3: 9 77935 79583 84707 564578 594898 681805 681886 835470 880698 1109091 1125108 1279972 1463445 1497566 1783284 1997564 2006526 2070954 2250217 2268713 2276203 2374802 2571397 2640902 2647217 2732378 2821237 3088028 3092827 3211549 3283735 3491412 3492254 3498305 3505664 3547201 3603437 3617913 3793767 3907547 4021634 4025897 4086017 4183126 4184025 4189168 4192731 4395141 4899940 4987592 4999120 5017477 5149173 5149311 5158741 5223097 5302153 5474252 5535280 
4: 145 
5: 8 57544 58089 60048 65880 284186 313376 564578 717529 729993 1097284 1204280 1204407 1255317 1670218 1720928 1850305 2269887 2333350 2359764 2640693 2743982 3303009 3322952 3492254 3573013 3721693 3797343 3797349 3797359 3849461 4033556 4173124 4189215 4207986 4669945 4817900 4901416 5010479 5062062 5072938 5098953 5292042 5429924 5599862 5599863 5689049

和我的用户。 TXT看起来像

$ hadoop dfs -cat input/users.txt 
DEPRECATED: Use of this script to execute hdfs command is deprecated. 
Instead use the hdfs command for it. 

3: 9 77935 79583 84707 564578 594898 681805 681886 835470 880698 1109091 
1125108 1279972 1463445 1497566 1783284 1997564 2006526 2070954 2250217 
2268713 2276203 2374802 2571397 2640902 2647217 2732378 2821237 3088028 
3092827 3211549 3283735 3491412 3492254 3498305 3505664 3547201 3603437 
3617913 3793767 3907547 4021634 4025897 4086017 4183126 4184025 4189168 
4192731 4395141 4899940 4987592 4999120 5017477 5149173 5149311 5158741 
5223097 5302153 5474252 5535280

我跑我的工作作为

$ hadoop jar mahout-core-0.9-cdh5.2.0-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile input/users.txt --booleanData -s SIMILARITY_COOCCURRENCE

和失败与以下跟踪

15/02/07 16:48:44 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --maxPrefsInItemSimilarity=[500], --maxPrefsPerUser=[10], --maxSimilaritiesPerItem=[100], --minPrefsPerUser=[1], --numRecommendations=[10], --similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp], --usersFile=[input/users.txt]} 
15/02/07 16:48:44 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[input/input.txt], --minPrefsPerUser=[1], --output=[temp/preparePreferenceMatrix], --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]} 
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress 
15/02/07 16:48:44 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 
15/02/07 16:48:44 INFO client.RMProxy: Connecting to ResourceManager at name1.hadoop.dc.engr.scu.edu/10.128.0.201:8032 
15/02/07 16:48:45 INFO input.FileInputFormat: Total input paths to process : 1 
15/02/07 16:48:45 INFO mapreduce.JobSubmitter: number of splits:8 
15/02/07 16:48:46 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1422500076160_0023 
15/02/07 16:48:46 INFO impl.YarnClientImpl: Submitted application application_1422500076160_0023 
15/02/07 16:48:46 INFO mapreduce.Job: The url to track the job: http://name1.hadoop.dc.engr.scu.edu:8088/proxy/application_1422500076160_0023/ 
15/02/07 16:48:46 INFO mapreduce.Job: Running job: job_1422500076160_0023 
15/02/07 16:48:56 INFO mapreduce.Job: Job job_1422500076160_0023 running in uber mode : false 
15/02/07 16:48:56 INFO mapreduce.Job: map 0% reduce 0% 
15/02/07 16:49:02 INFO mapreduce.Job: Task Id : attempt_1422500076160_0023_m_000006_0, Status : FAILED 
Error: java.lang.ArrayIndexOutOfBoundsException: 1 
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50) 
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 

15/02/07 16:49:02 INFO mapreduce.Job: Task Id : attempt_1422500076160_0023_m_000001_0, Status : FAILED 
Error: java.lang.ArrayIndexOutOfBoundsException: 1 
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50) 
    at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31) 
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) 
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) 
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

我相信，数据格式不正确，可有一个人请帮我解决这个问题？我是新来MapReduce和Hadoop

非常感谢

enter image description here

来源

2015-02-08 daydreamer

堆栈跟踪提到数组，但没有代码片段很难说为什么会有错误。 – 2015-02-08 04:59:58

我不上这个项目工作了，这本书是不支持在这个阶段。但是，您似乎是在原始输入上运行此项工作，而不是在将此格式解析为标准格式后，使用您在本书中看到的自定义映射器。

来源

2015-02-08 09:59:38

我以为'RecommenderJob'正在做这件事 – daydreamer 2015-02-08 20:09:41

不，它预计输入是用户，项目，评级格式。这不是维基百科的数据。 6.3.2中的代码是最初的翻译。 – 2015-02-08 20:33:36

我很困惑，根据书（附上图）的图，'RecommenderJob'似乎具有所有必需的映射器和缩减器，现在既然不是这样，我是否需要运行'WikipediaToItemPrefsMapper'和'WikipediaToUserVectorReducer'？并将输出提供给'RecommenderJob'？请帮助 – daydreamer 2015-02-09 21:57:10

亨利马乌在行动：06章：维基百科作业失败java.lang.ArrayIndexOutOfBoundsException

回答

相关问题