如何使用apache-poi获取文件的全部内容？

我尝试使用帮助java api Apache POI读取文件.docx。我使用：如何使用apache-poi获取文件的全部内容？

public static String view(String nameDoc){ 
    String text = null; 
    try{ 
     XWPFDocument docx = new XWPFDocument(
       new FileInputStream(nameDoc)); 
     XWPFWordExtractor we = new XWPFWordExtractor(docx); 
     text = we.getText(); 
     we.close(); 
     docx.close(); 
    }catch (Exception e){ 
     e.printStackTrace(); 
    } 
    return text; 
}

在这种情况下，我得到的只是文件的文本，但我的文件包括文本，表格，图片...我怎样才能得到文件的全部内容？

来源

2016-09-16 Oleg1n

看到我的答案，它会工作，并帮助你.. –

你是什么意思“文件的全部内容”？例如，我看不出如何在文本字符串中获取图片.... – Gagravarr

此答案应该有所帮助http://stackoverflow.com/a/28304463/1997376 –

String contents = ""; 

    try { 
     System.out.println("Starting the test"); 
     POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("D:/Resume.doc")); 
     HWPFDocument doc = new HWPFDocument(fs); 
     WordExtractor we = new WordExtractor(doc); 
     OutputStream file = new FileOutputStream(new File("D:/test.pdf")); 
     PdfWriter parser = PdfWriter.getInstance(doc, file); 
     parser.parse(); 
     PDDocument pdfDocument = parser.getPDDocument(); 
     PDFTextStripper stripper = new PDFTextStripper(); 
     contents = stripper.getText(pdfDocument); 
     pdfDocument.close(); 

    } catch (Exception e) { 
     logger.error(e.getMessage()); 
    }

在contents你会得到完整的文件内容。

来源

2016-09-16 09:33:31

它是一个docx不是pdf –

它doesn '提供完整的内容（图像，表..包括），但只有文本内容 –

@NicolasFilotto，提取图像请参考http://stackoverflow.com/questions/7063324/extract-image-from-pdf-using- java –

如何使用apache-poi获取文件的全部内容？

回答

相关问题