2016-04-03 158 views
0

我是新来的大数据和hadoop世界,我试图运行一个代码availabe在谷歌它包括四个步骤,如将数据放入hadoop文件系统,然后为数据添加索引,然后使用map和reduce创建缩减数据的主要步骤。在ubuntu 14.04 hadoop 2.6中运行hadoop程序单节点集群设置hadoop 2.6

我能够运行前两个步骤: 代码使用XML来处理的位置:

这是我使用的代码是http://asterixdb.ics.uci.edu/fuzzyjoin/

当我这样做的最后一步是模糊加盟它给了我一系列错误:

特此跟踪文件附加到:

[email protected]:/home/midhu/fuzzyjoin$ cd fuzzyjoin-hadoop 
[email protected]:/home/midhu/fuzzyjoin/fuzzyjoin-hadoop$ hadoop jar target/fuzzyjoin-hadoop-0.0.2-SNAPSHOT.jar fuzzyjoin -conf src/main/resources/fuzzyjoin/dblp.quickstart.xml 
16/04/03 13:55:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
Complete-Job started: Sun Apr 03 13:55:42 IST 2016 
Multi-Job started: Sun Apr 03 13:55:42 IST 2016 
FuzzyJoinDriver(TokensBasic.phase1) 
    Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/records-000} 
    Output Path: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000 
    Map Jobs: 2 
    Reduce Jobs: 1 
    Properties: {fuzzyjoin.similarity.name=Jaccard 
       fuzzyjoin.similarity.threshold=.5 
       fuzzyjoin.tokenizer=Word 
       fuzzyjoin.tokens.package=Scalar 
       fuzzyjoin.tokens.lengthstats=false 
       fuzzyjoin.ridpairs.group.class=TokenIdentity 
       fuzzyjoin.ridpairs.group.factor=1 
       fuzzyjoin.data.tokens= 
       fuzzyjoin.data.joinindex=} 
Job started: Sun Apr 03 13:55:42 IST 2016 
16/04/03 13:55:42 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 
16/04/03 13:55:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 
16/04/03 13:55:42 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
16/04/03 13:55:43 INFO mapred.FileInputFormat: Total input paths to process : 1 
16/04/03 13:55:43 INFO mapreduce.JobSubmitter: number of splits:1 
16/04/03 13:55:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1780986358_0001 
16/04/03 13:55:44 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 
16/04/03 13:55:44 INFO mapreduce.Job: Running job: job_local1780986358_0001 
16/04/03 13:55:44 INFO mapred.LocalJobRunner: OutputCommitter set in config null 
16/04/03 13:55:44 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 
16/04/03 13:55:45 INFO mapred.LocalJobRunner: Waiting for map tasks 
16/04/03 13:55:45 INFO mapred.LocalJobRunner: Starting task: attempt_local1780986358_0001_m_000000_0 
16/04/03 13:55:46 INFO mapreduce.Job: Job job_local1780986358_0001 running in uber mode : false 
16/04/03 13:55:46 INFO mapreduce.Job: map 0% reduce 0% 
16/04/03 13:55:46 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
16/04/03 13:55:46 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687 
16/04/03 13:55:46 INFO mapred.MapTask: numReduceTasks: 1 
16/04/03 13:55:49 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 
16/04/03 13:55:49 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 
16/04/03 13:55:49 INFO mapred.MapTask: soft limit at 83886080 
16/04/03 13:55:49 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 
16/04/03 13:55:49 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 
16/04/03 13:55:49 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
16/04/03 13:55:52 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687 > map 
16/04/03 13:55:54 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687 > map 
16/04/03 13:55:54 INFO mapred.MapTask: Starting flush of map output 
16/04/03 13:55:54 INFO mapred.MapTask: Spilling map output 
16/04/03 13:55:54 INFO mapred.MapTask: bufstart = 0; bufend = 15588; bufvoid = 104857600 
16/04/03 13:55:54 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26209408(104837632); length = 4989/6553600 
16/04/03 13:55:54 INFO mapred.MapTask: Finished spill 0 
16/04/03 13:55:54 INFO mapred.Task: Task:attempt_local1780986358_0001_m_000000_0 is done. And is in the process of committing 
16/04/03 13:55:54 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687 
16/04/03 13:55:54 INFO mapred.Task: Task 'attempt_local1780986358_0001_m_000000_0' done. 
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Finishing task: attempt_local1780986358_0001_m_000000_0 
16/04/03 13:55:54 INFO mapred.LocalJobRunner: map task executor complete. 
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Waiting for reduce tasks 
16/04/03 13:55:54 INFO mapred.LocalJobRunner: Starting task: attempt_local1780986358_0001_r_000000_0 
16/04/03 13:55:54 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
16/04/03 13:55:54 INFO mapreduce.Job: map 100% reduce 0% 
16/04/03 13:55:54 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: [email protected] 
16/04/03 13:55:54 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10 
16/04/03 13:55:54 INFO reduce.EventFetcher: attempt_local1780986358_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 
16/04/03 13:55:56 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1780986358_0001_m_000000_0 decomp: 9062 len: 9066 to MEMORY 
16/04/03 13:55:56 INFO reduce.InMemoryMapOutput: Read 9062 bytes from map-output for attempt_local1780986358_0001_m_000000_0 
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 9062, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->9062 
16/04/03 13:55:57 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 
16/04/03 13:55:57 INFO mapred.LocalJobRunner: 1/1 copied. 
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs 
16/04/03 13:55:57 INFO mapred.Merger: Merging 1 sorted segments 
16/04/03 13:55:57 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes 
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merged 1 segments, 9062 bytes to disk to satisfy reduce memory limit 
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merging 1 files, 9066 bytes from disk 
16/04/03 13:55:57 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 
16/04/03 13:55:57 INFO mapred.Merger: Merging 1 sorted segments 
16/04/03 13:55:57 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes 
16/04/03 13:55:57 INFO mapred.LocalJobRunner: 1/1 copied. 
16/04/03 13:56:00 INFO mapred.LocalJobRunner: reduce > reduce 
16/04/03 13:56:00 INFO mapreduce.Job: map 100% reduce 100% 
16/04/03 13:56:01 INFO mapred.Task: Task:attempt_local1780986358_0001_r_000000_0 is done. And is in the process of committing 
16/04/03 13:56:01 INFO mapred.LocalJobRunner: reduce > reduce 
16/04/03 13:56:01 INFO mapred.Task: Task attempt_local1780986358_0001_r_000000_0 is allowed to commit now 
16/04/03 13:56:02 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1780986358_0001_r_000000_0' to hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/_temporary/0/task_local1780986358_0001_r_000000 
16/04/03 13:56:02 INFO mapred.LocalJobRunner: reduce > reduce 
16/04/03 13:56:02 INFO mapred.Task: Task 'attempt_local1780986358_0001_r_000000_0' done. 
16/04/03 13:56:02 INFO mapred.LocalJobRunner: Finishing task: attempt_local1780986358_0001_r_000000_0 
16/04/03 13:56:02 INFO mapred.LocalJobRunner: reduce task executor complete. 
16/04/03 13:56:02 INFO mapreduce.Job: Job job_local1780986358_0001 completed successfully 
16/04/03 13:56:03 INFO mapreduce.Job: Counters: 38 
    File System Counters 
     FILE: Number of bytes read=1080562 
     FILE: Number of bytes written=1589660 
     FILE: Number of read operations=0 
     FILE: Number of large read operations=0 
     FILE: Number of write operations=0 
     HDFS: Number of bytes read=73374 
     HDFS: Number of bytes written=12847 
     HDFS: Number of read operations=15 
     HDFS: Number of large read operations=0 
     HDFS: Number of write operations=18 
    Map-Reduce Framework 
     Map input records=100 
     Map output records=1248 
     Map output bytes=15588 
     Map output materialized bytes=9066 
     Input split bytes=120 
     Combine input records=1248 
     Combine output records=597 
     Reduce input groups=597 
     Reduce shuffle bytes=9066 
     Reduce input records=597 
     Reduce output records=597 
     Spilled Records=1194 
     Shuffled Maps =1 
     Failed Shuffles=0 
     Merged Map outputs=1 
     GC time elapsed (ms)=176 
     CPU time spent (ms)=0 
     Physical memory (bytes) snapshot=0 
     Virtual memory (bytes) snapshot=0 
     Total committed heap usage (bytes)=241836032 
    Shuffle Errors 
     BAD_ID=0 
     CONNECTION=0 
     IO_ERROR=0 
     WRONG_LENGTH=0 
     WRONG_MAP=0 
     WRONG_REDUCE=0 
    File Input Format Counters 
     Bytes Read=36687 
    File Output Format Counters 
     Bytes Written=12847 
Job ended: Sun Apr 03 13:56:04 IST 2016 
The job took 21.44 seconds. 
FuzzyJoinDriver(TokensBasic.phase2) 
    Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000} 
    Output Path: hdfs://localhost:54310/user/hduser/dblp-small/tokens-000 
    Map Jobs: 2 
    Reduce Jobs: 1 
    Properties: {fuzzyjoin.similarity.name=Jaccard 
       fuzzyjoin.similarity.threshold=.5 
       fuzzyjoin.tokenizer=Word 
       fuzzyjoin.tokens.package=Scalar 
       fuzzyjoin.tokens.lengthstats=false 
       fuzzyjoin.ridpairs.group.class=TokenIdentity 
       fuzzyjoin.ridpairs.group.factor=1 
       fuzzyjoin.data.tokens= 
       fuzzyjoin.data.joinindex=} 
Job started: Sun Apr 03 13:56:04 IST 2016 
16/04/03 13:56:04 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
16/04/03 13:56:04 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
16/04/03 13:56:05 INFO mapred.FileInputFormat: Total input paths to process : 1 
16/04/03 13:56:05 INFO mapreduce.JobSubmitter: number of splits:1 
16/04/03 13:56:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local954589393_0002 
16/04/03 13:56:05 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 
16/04/03 13:56:05 INFO mapreduce.Job: Running job: job_local954589393_0002 
16/04/03 13:56:05 INFO mapred.LocalJobRunner: OutputCommitter set in config null 
16/04/03 13:56:05 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 
16/04/03 13:56:05 INFO mapred.LocalJobRunner: Waiting for map tasks 
16/04/03 13:56:05 INFO mapred.LocalJobRunner: Starting task: attempt_local954589393_0002_m_000000_0 
16/04/03 13:56:05 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
16/04/03 13:56:05 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/part-00000:0+12847 
16/04/03 13:56:05 INFO mapred.MapTask: numReduceTasks: 1 
16/04/03 13:56:06 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 
16/04/03 13:56:06 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 
16/04/03 13:56:06 INFO mapred.MapTask: soft limit at 83886080 
16/04/03 13:56:06 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 
16/04/03 13:56:06 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 
16/04/03 13:56:06 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 
16/04/03 13:56:06 INFO mapred.MapTask: Starting flush of map output 
16/04/03 13:56:06 INFO mapred.MapTask: Spilling map output 
16/04/03 13:56:06 INFO mapred.MapTask: bufstart = 0; bufend = 7866; bufvoid = 104857600 
16/04/03 13:56:06 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26212012(104848048); length = 2385/6553600 
16/04/03 13:56:06 INFO mapred.MapTask: Finished spill 0 
16/04/03 13:56:06 INFO mapred.Task: Task:attempt_local954589393_0002_m_000000_0 is done. And is in the process of committing 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: hdfs://localhost:54310/user/hduser/dblp-small/tokens.phase1-000/part-00000:0+12847 
16/04/03 13:56:06 INFO mapred.Task: Task 'attempt_local954589393_0002_m_000000_0' done. 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local954589393_0002_m_000000_0 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: map task executor complete. 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Waiting for reduce tasks 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Starting task: attempt_local954589393_0002_r_000000_0 
16/04/03 13:56:06 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
16/04/03 13:56:06 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: [email protected] 
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10 
16/04/03 13:56:06 INFO reduce.EventFetcher: attempt_local954589393_0002_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 
16/04/03 13:56:06 INFO reduce.LocalFetcher: localfetcher#2 about to shuffle output of map attempt_local954589393_0002_m_000000_0 decomp: 9062 len: 9066 to MEMORY 
16/04/03 13:56:06 INFO reduce.InMemoryMapOutput: Read 9062 bytes from map-output for attempt_local954589393_0002_m_000000_0 
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 9062, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->9062 
16/04/03 13:56:06 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1/1 copied. 
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs 
16/04/03 13:56:06 INFO mapred.Merger: Merging 1 sorted segments 
16/04/03 13:56:06 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes 
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merged 1 segments, 9062 bytes to disk to satisfy reduce memory limit 
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merging 1 files, 9066 bytes from disk 
16/04/03 13:56:06 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 
16/04/03 13:56:06 INFO mapred.Merger: Merging 1 sorted segments 
16/04/03 13:56:06 INFO mapreduce.Job: Job job_local954589393_0002 running in uber mode : false 
16/04/03 13:56:06 INFO mapreduce.Job: map 100% reduce 0% 
16/04/03 13:56:06 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 9056 bytes 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1/1 copied. 
16/04/03 13:56:06 INFO mapred.Task: Task:attempt_local954589393_0002_r_000000_0 is done. And is in the process of committing 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: 1/1 copied. 
16/04/03 13:56:06 INFO mapred.Task: Task attempt_local954589393_0002_r_000000_0 is allowed to commit now 
16/04/03 13:56:06 INFO output.FileOutputCommitter: Saved output of task 'attempt_local954589393_0002_r_000000_0' to hdfs://localhost:54310/user/hduser/dblp-small/tokens-000/_temporary/0/task_local954589393_0002_r_000000 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: reduce > reduce 
16/04/03 13:56:06 INFO mapred.Task: Task 'attempt_local954589393_0002_r_000000_0' done. 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: Finishing task: attempt_local954589393_0002_r_000000_0 
16/04/03 13:56:06 INFO mapred.LocalJobRunner: reduce task executor complete. 
16/04/03 13:56:07 INFO mapreduce.Job: map 100% reduce 100% 
16/04/03 13:56:07 INFO mapreduce.Job: Job job_local954589393_0002 completed successfully 
16/04/03 13:56:07 INFO mapreduce.Job: Counters: 38 
    File System Counters 
     FILE: Number of bytes read=2179300 
     FILE: Number of bytes written=3182466 
     FILE: Number of read operations=0 
     FILE: Number of large read operations=0 
     FILE: Number of write operations=0 
     HDFS: Number of bytes read=99068 
     HDFS: Number of bytes written=31172 
     HDFS: Number of read operations=45 
     HDFS: Number of large read operations=0 
     HDFS: Number of write operations=30 
    Map-Reduce Framework 
     Map input records=597 
     Map output records=597 
     Map output bytes=7866 
     Map output materialized bytes=9066 
     Input split bytes=126 
     Combine input records=0 
     Combine output records=0 
     Reduce input groups=18 
     Reduce shuffle bytes=9066 
     Reduce input records=597 
     Reduce output records=597 
     Spilled Records=1194 
     Shuffled Maps =1 
     Failed Shuffles=0 
     Merged Map outputs=1 
     GC time elapsed (ms)=488 
     CPU time spent (ms)=0 
     Physical memory (bytes) snapshot=0 
     Virtual memory (bytes) snapshot=0 
     Total committed heap usage (bytes)=336207872 
    Shuffle Errors 
     BAD_ID=0 
     CONNECTION=0 
     IO_ERROR=0 
     WRONG_LENGTH=0 
     WRONG_MAP=0 
     WRONG_REDUCE=0 
    File Input Format Counters 
     Bytes Read=12847 
    File Output Format Counters 
     Bytes Written=5478 
Job ended: Sun Apr 03 13:56:07 IST 2016 
The job took 3.563 seconds. 
Multi-Job ended: Sun Apr 03 13:56:07 IST 2016 
The multi-job took 25.128 seconds. 
FuzzyJoinDriver(RIDPairsImproved) 
    Input Path: {hdfs://localhost:54310/user/hduser/dblp-small/records-000} 
    Output Path: hdfs://localhost:54310/user/hduser/dblp-small/ridpairs-000 
    Map Jobs: 2 
    Reduce Jobs: 1 
    Properties: {fuzzyjoin.similarity.name=Jaccard 
       fuzzyjoin.similarity.threshold=.5 
       fuzzyjoin.tokenizer=Word 
       fuzzyjoin.tokens.package=Scalar 
       fuzzyjoin.tokens.lengthstats=false 
       fuzzyjoin.ridpairs.group.class=TokenIdentity 
       fuzzyjoin.ridpairs.group.factor=1 
       fuzzyjoin.data.tokens=dblp-small/tokens-000/part-00000 
       fuzzyjoin.data.joinindex=} 
Job started: Sun Apr 03 13:56:08 IST 2016 
16/04/03 13:56:08 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
16/04/03 13:56:08 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 
16/04/03 13:56:09 INFO mapred.FileInputFormat: Total input paths to process : 1 
16/04/03 13:56:09 INFO mapreduce.JobSubmitter: number of splits:1 
16/04/03 13:56:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1951342027_0003 
16/04/03 13:56:16 INFO mapred.LocalDistributedCacheManager: Creating symlink: /tmp/mapred/local/1459671970648/part-00000 <- /home/midhu/fuzzyjoin/fuzzyjoin-hadoop/part-00000 
16/04/03 13:56:16 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:54310/user/hduser/dblp-small/tokens-000/part-00000 as file:/tmp/mapred/local/1459671970648/part-00000 
16/04/03 13:56:17 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 
16/04/03 13:56:17 INFO mapreduce.Job: Running job: job_local1951342027_0003 
16/04/03 13:56:17 INFO mapred.LocalJobRunner: OutputCommitter set in config null 
16/04/03 13:56:17 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 
16/04/03 13:56:17 INFO mapred.LocalJobRunner: Waiting for map tasks 
16/04/03 13:56:17 INFO mapred.LocalJobRunner: Starting task: attempt_local1951342027_0003_m_000000_0 
16/04/03 13:56:17 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 
16/04/03 13:56:17 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/user/hduser/dblp-small/records-000/part-00000:0+36687 
16/04/03 13:56:17 INFO mapred.MapTask: numReduceTasks: 1 
16/04/03 13:56:17 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 
16/04/03 13:56:17 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 
16/04/03 13:56:17 INFO mapred.MapTask: soft limit at 83886080 
16/04/03 13:56:17 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 
16/04/03 13:56:17 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 
16/04/03 13:56:17 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 
16/04/03 13:56:17 INFO mapred.LocalJobRunner: map task executor complete. 
16/04/03 13:56:17 WARN mapred.LocalJobRunner: job_local1951342027_0003 
java.lang.Exception: java.lang.RuntimeException: Error in configuring object 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) 
Caused by: java.lang.RuntimeException: Error in configuring object 
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:446) 
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) 
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) 
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.reflect.InvocationTargetException 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
    ... 10 more 
Caused by: java.lang.RuntimeException: Error in configuring object 
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) 
    ... 15 more 
Caused by: java.lang.reflect.InvocationTargetException 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
    ... 18 more 
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: file:/tmp/mapred/local/1459671970648/part-00000 (No such file or directory) 
    at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:60) 
    at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:40) 
    at edu.uci.ics.fuzzyjoin.hadoop.ridpairs.token.MapSelfJoin.configure(MapSelfJoin.java:98) 
    ... 23 more 
Caused by: java.io.FileNotFoundException: file:/tmp/mapred/local/1459671970648/part-00000 (No such file or directory) 
    at java.io.FileInputStream.open(Native Method) 
    at java.io.FileInputStream.<init>(FileInputStream.java:146) 
    at java.io.FileInputStream.<init>(FileInputStream.java:101) 
    at edu.uci.ics.fuzzyjoin.tokenorder.TokenLoad.loadTokenRank(TokenLoad.java:45) 
    ... 25 more 
16/04/03 13:56:18 INFO mapreduce.Job: Job job_local1951342027_0003 running in uber mode : false 
16/04/03 13:56:18 INFO mapreduce.Job: map 0% reduce 0% 
16/04/03 13:56:18 INFO mapreduce.Job: Job job_local1951342027_0003 failed with state FAILED due to: NA 
16/04/03 13:56:18 INFO mapreduce.Job: Counters: 0 
java.io.IOException: Job failed! 
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836) 
    at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoinDriver.run(FuzzyJoinDriver.java:179) 
    at edu.uci.ics.fuzzyjoin.hadoop.ridpairs.RIDPairsImproved.main(RIDPairsImproved.java:108) 
    at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoin.bib(FuzzyJoin.java:39) 
    at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoin.main(FuzzyJoin.java:86) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) 
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) 
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152) 
    at edu.uci.ics.fuzzyjoin.hadoop.FuzzyJoinDriver.main(FuzzyJoinDriver.java:121) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 

我觉得这是Ubuntu中hadoop的配置错误我使用了本教程中的配置 http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php

+0

有没有人有任何想法 –

回答

0

最后,我成功运行代码并更正了错误。该错误是由于在本地机器上运行mapreduce程序,我将其更改为在纱线中运行,并且代码适用于所有类型的数据