2017-02-15 68 views
0
object newpdf { 
    def main(args: Array[String]) { 
    val sc = new SparkContext("local[*]","appName") 
    val path = "hdfs://namenode2.aibl.net:8020/ABDF/akhilaajith/PF_knnmodel_1231480046927236/visualise1/model_points" 
    val data = sc.binaryFiles(path) 
    val rdd = data.map(x => { 
     tikaFunc(x) 
    }) 
    rdd.foreach(println) 
    } 

    def tikaFunc(a: (String, PortableDataStream)) = { 
    val file: File = new File(a._1.drop(5)) 
    val myparser: AutoDetectParser = new AutoDetectParser() 
    val stream: InputStream = new FileInputStream(a._1) 
    val handler: WriteOutContentHandler = new WriteOutContentHandler(-1) 
    val metadata: Metadata = new Metadata() 
    val context: ParseContext = new ParseContext() 
    myparser.parse(stream, handler, metadata, context) 
    stream.close 
    val delimiter = " " 
    Array(file.getName, handler.toString.trim).mkString(delimiter) 
    } 
} 

它显示错误的InputStream作为如何使用Scala在Apache Tika中提供hdfs路径?

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 1, localhost): java.io.FileNotFoundException: hdfs:/namenode2.aibl.net:8020/ABDF/akhilaajith/PF_knnmodel_1231480046927236/visualise1/model_points/part-00000 (No such file or directory) 

怎么才能解决这个问题呢?

+0

它说找不到'HDFS文件: /namenode2.aibl.net:8020/ABDF/akhilaajith/PF_knnmodel_1231480046927236/visualise1/model_points /部分00000'。也许路径不正确? – dk14

+0

路径是正确的。我检查该文件hdfs并且还有内容 – AkhilaV

回答

0

看起来您的输入文件路径中存在一些问题。当你从hdfs指定一个文件,你需要给文件路径hdfs://<your_complete_file_path格式

在你的代码替换

val path = "hdfs://namenode2.aibl.net:8020/ABDF/akhilaajith/PF_knnmodel_1231480046927236/visualise1/model_points" 

val path = "hdfs:///namenode2.aibl.net:8020/ABDF/akhilaajith/PF_knnmodel_1231480046927236/visualise1/model_points" 
+0

显示错误,同时将//替换为///“线程中的异常”main“java.io.IOException:未完成的HDFS URI,没有主机:hdfs:///namenode2.aibl。 net:8020/ABDF/akhilaajith/PF_knnmodel_1231480046927236/visualise1/model_points“ – AkhilaV

+0

你能检查'hadoop fs -ls/ABDF/akhilaajith/PF_knnmodel _1231480046927236/vi sualise1/model_point s'的结果是什么? –

+0

hadoop fs -ls/ABDF/akhilaajith/PF_knnmodel _1231480046927236/vi sualise 1/model_point s不是正确的路径。 “hdfs://namenode2.aibl.net:8020/ABDF/akhilaajith/PF_knnmodel_1231480046927236/visualise1/model_points”这是正确的路径和获取错误,因为我上面提到 – AkhilaV

相关问题