hadoop写输出到hdfs文件

我写了我的第一个map reduce程序，当我在eclipse中运行它时，它写入输出文件并按预期工作。但是，当我从命令行使用hadoop jar myjar.jar运行它时，结果没有写入输出文件。输出文件（_SUCCESS和part-r-0000）正在创建，但它们都是空的。有没有持久性问题？减少输入记录= 12，但减少输出记录= 0。但是如果我在日食中做同样的事情也不是零。在eclipse中减少输出记录不是0.任何帮助表示赞赏。由于hadoop写输出到hdfs文件

[[email protected] Desktop]$ sudo hadoop jar checkjar.jar hdfs://quickstart.cloudera:8020/user/cloudera/input.csv hdfs://quickstart.cloudera:8020/user/cloudera/output9 
15/04/28 22:09:06 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 
15/04/28 22:09:07 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 
15/04/28 22:09:08 INFO input.FileInputFormat: Total input paths to process : 1 
15/04/28 22:09:09 INFO mapreduce.JobSubmitter: number of splits:1 
15/04/28 22:09:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1430279123629_0011 
15/04/28 22:09:10 INFO impl.YarnClientImpl: Submitted application application_1430279123629_0011 
15/04/28 22:09:10 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1430279123629_0011/ 
15/04/28 22:09:10 INFO mapreduce.Job: Running job: job_1430279123629_0011 
15/04/28 22:09:22 INFO mapreduce.Job: Job job_1430279123629_0011 running in uber mode : false 
15/04/28 22:09:22 INFO mapreduce.Job: map 0% reduce 0% 
15/04/28 22:09:32 INFO mapreduce.Job: map 100% reduce 0% 
15/04/28 22:09:46 INFO mapreduce.Job: map 100% reduce 100% 
15/04/28 22:09:46 INFO mapreduce.Job: Job job_1430279123629_0011 completed successfully 
15/04/28 22:09:46 INFO mapreduce.Job: Counters: 49 
    File System Counters 
     FILE: Number of bytes read=265 
     FILE: Number of bytes written=211403 
     FILE: Number of read operations=0 
     FILE: Number of large read operations=0 
     FILE: Number of write operations=0 
     HDFS: Number of bytes read=365 
     HDFS: Number of bytes written=0 
     HDFS: Number of read operations=6 
     HDFS: Number of large read operations=0 
     HDFS: Number of write operations=2 
    Job Counters 
     Launched map tasks=1 
     Launched reduce tasks=1 
     Data-local map tasks=1 
     Total time spent by all maps in occupied slots (ms)=8175 
     Total time spent by all reduces in occupied slots (ms)=10124 
     Total time spent by all map tasks (ms)=8175 
     Total time spent by all reduce tasks (ms)=10124 
     Total vcore-seconds taken by all map tasks=8175 
     Total vcore-seconds taken by all reduce tasks=10124 
     Total megabyte-seconds taken by all map tasks=8371200 
     Total megabyte-seconds taken by all reduce tasks=10366976 
    Map-Reduce Framework 
     Map input records=12 
     Map output records=12 
     Map output bytes=235 
     Map output materialized bytes=265 
     Input split bytes=120 
     Combine input records=0 
     Combine output records=0 
     Reduce input groups=2 
     Reduce shuffle bytes=265 
     Reduce input records=12 
     Reduce output records=0 
     Spilled Records=24 
     Shuffled Maps =1 
     Failed Shuffles=0 
     Merged Map outputs=1 
     GC time elapsed (ms)=172 
     CPU time spent (ms)=1150 
     Physical memory (bytes) snapshot=346574848 
     Virtual memory (bytes) snapshot=1705988096 
     Total committed heap usage (bytes)=196481024 
    Shuffle Errors 
     BAD_ID=0 
     CONNECTION=0 
     IO_ERROR=0 
     WRONG_LENGTH=0 
     WRONG_MAP=0 
     WRONG_REDUCE=0 
    File Input Format Counters 
     Bytes Read=245 
    File Output Format Counters 
     Bytes Written=0

Reducer.java

package com.mapreduce.assgn4; 
import java.io.IOException; 
import java.util.ArrayList; 
import java.util.List; 

import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapreduce.Reducer; 



public class JoinReducer 
extends Reducer<Text, Text, Text, Text> { 
@Override 
public void reduce(Text key, Iterable<Text> values, 
Context context) 
throws IOException, InterruptedException { 
List<String> tableoneTuples = new ArrayList<String>(); 
List<String> tabletwoTuples = new ArrayList<String>(); 

for (Text value : values) { 
    String[] splitValues = value.toString().split("#"); 
    String tableName = splitValues[0]; 
    if(tableName.equals(JoinMapper.tableone)) 
    { 
     tableoneTuples.add(splitValues[1]); 
    } 
    else 
    { 
     tabletwoTuples.add(splitValues[1]); 
    } 
} 
System.out.println(tableoneTuples.size()); 
System.out.println(tabletwoTuples.size()); 

String FinaljoinString = null; 
for(String tableoneValue: tableoneTuples) 
{ 
    for (String tabletwoValue: tabletwoTuples) 
    { 
     FinaljoinString = tableoneValue+","+tabletwoValue; 
     FinaljoinString = key.toString()+","+FinaljoinString; 
     context.write(null, new Text(FinaljoinString)); 
    } 
} 

} 
}

来源

2015-04-29 Vignesh Kumar

你在减速context.write都有漏洞。您需要让NullWritable在输出中具有空值，

context.write(NullWritable, new Text(FinaljoinString));

来源

2015-04-29 06:03:59

谢谢。我会尝试相同的。但是为什么当我使用eclipse运行它时不会发生同样的问题？ –

hadoop写输出到hdfs文件

回答

相关问题