2014-11-03 45 views
0

我有一个成功运行的hadoop程序。我需要从中提取jobID。我用下面的代码来做到这一点:何时调用job.getJobID()?

 Configuration conf = new Configuration(); 

     conf.addResource(new Path("../conf/core-site.xml")); 
     conf.addResource(new Path("../conf/mapred-site.xml")); 

     conf.addResource(new Path("../conf/hadoop/hdfs-site.xml")); 


     Job job = new Job(conf,"CloudViTra2.0_Transcoder - Job1"); 



     job.setJarByClass(VideoTranscoder.class); 
     job.setMapperClass(First_Mapper.class); 
     job.setReducerClass(First_Reducer.class); 
     job.setOutputKeyClass(Text.class); 
     job.setOutputValueClass(Text.class); 


     FileInputFormat.addInputPath(job, new Path("../thesis_uploads/input/"+getFileName[0]+".txt")); 


     Path output = new Path("../thesis_uploads/output_"+fileName+"/"); 


     FileOutputFormat.setOutputPath(job, output); 

     job.waitForCompletion(true); 
     currentJob = job.getJobID().toString(); 

这里的问题是,这个程序等待,直到工作完成。执行时我需要jobID。我怎样才能做到这一点?

回答

0

您可能需要使用此作业的客户端API参考this

使用jobstatus []。getAllJobs()和jobstatus []。jobsToComplete()来获得当前正在运行的作业的jobIds。

下面为一个伪代码:

Configuration conf = new Configuration(); 
    conf.addResource(new Path(hadoopConfPath + "core-site.xml")); 
    conf.addResource(new Path(hadoopConfPath + "hdfs-site.xml")); 
    conf.addResource(new Path(hadoopConfPath + "mapred-site.xml")); 

    InetSocketAddress jobtracker = new InetSocketAddress(jobTrackerHost, jobTrackerPort); 
    JobClient jobClient = new JobClient(jobtracker, conf); 
    jobClient.setConf(conf); 

    JobStatus[] jobs = jobClient.getAllJobs(); 


    for (int i = 0; i < jobs.length; i++) { 
     JobStatus js = jobs[i]; 
     JobID job1 = js.getJobID(); 

希望这有助于

+0

在JobTrackerHost会来我的虚拟机的IP?和端口是50030?对? – 2014-11-04 19:03:59