2013-02-22 181 views
8

使用pdfbox,是否可以将PDF(或PDF字节[])转换为图像字节[]?我已经浏览了几个在线的例子,我能找到的唯一例子描述了如何直接将转换后的文件写入文件系统或将其转换为Java AWT对象。pdfbox将pdf转换为图像byte []

我宁愿不招致将图像文件写入文件系统的IO,读入一个字节[],然后将其删除。

所以这个我可以这样做:

String destinationImageFormat = "jpg"; 
boolean success = false; 
InputStream is = getClass().getClassLoader().getResourceAsStream("example.pdf"); 
PDDocument pdf = PDDocument.load(is, true); 

int resolution = 256; 
String password = ""; 
String outputPrefix = "myImageFile"; 

PDFImageWriter imageWriter = new PDFImageWriter();  

success = imageWriter.writeImage(pdf, 
        destinationImageFormat, 
        password, 
        1, 
        2, 
        outputPrefix, 
        BufferedImage.TYPE_INT_RGB, 
        resolution); 

除了这一点:

InputStream is = getClass().getClassLoader().getResourceAsStream("example.pdf"); 

PDDocument pdf = PDDocument.load(is, true); 
List<PDPage> pages = pdf.getDocumentCatalog().getAllPages(); 

for (PDPage page : pages) 
{ 
    BufferedImage image = page.convertToImage(); 
} 

如果我不是清楚是怎么变换分析数据的BufferedImage成一个byte []。我知道这是转换成imageWriter.writeImage()中的文件输出流,但我不清楚API的工作原理。

回答

11

您可以使用ImageIO.write写入OutputStream。要得到一个字节[],请使用ByteArrayOutputStream,然后在其上调用toByteArray()。

+1

感谢。这按预期工作。如果我有足够的声望,我会投你一票,但这是我第一次发布到StackOverflow。 – user2100746 2013-02-22 22:08:26

+0

不客气,你应该能够将其标记为已接受。 – aditsu 2013-02-22 22:09:19

+0

@ user2100746您应该将答案标记为已接受:) – Genjuro 2013-05-21 08:45:42

0
try {   
       PDDocument document = PDDocument.load(PdfInfo.getPDFWAY()); 
       if (document.isEncrypted()) { 
        document.decrypt(PdfInfo.getPASSWORD()); 
       } 
       if ("bilevel".equalsIgnoreCase(PdfInfo.getCOLOR())) { 
        PdfInfo.setIMAGETYPE(BufferedImage.TYPE_BYTE_BINARY); 
       } else if ("indexed".equalsIgnoreCase(PdfInfo.getCOLOR())) { 
        PdfInfo.setIMAGETYPE(BufferedImage.TYPE_BYTE_INDEXED); 
       } else if ("gray".equalsIgnoreCase(PdfInfo.getCOLOR())) { 
        PdfInfo.setIMAGETYPE(BufferedImage.TYPE_BYTE_GRAY); 
       } else if ("rgb".equalsIgnoreCase(PdfInfo.getCOLOR())) { 
        PdfInfo.setIMAGETYPE(BufferedImage.TYPE_INT_RGB); 
       } else if ("rgba".equalsIgnoreCase(PdfInfo.getCOLOR())) { 
        PdfInfo.setIMAGETYPE(BufferedImage.TYPE_INT_ARGB); 
       } else { 
        System.exit(2); 
       } 
       PDFImageWriter imageWriter = new PDFImageWriter(); 
       boolean success = imageWriter.writeImage(document, PdfInfo.getIMAGE_FORMAT(),PdfInfo.getPASSWORD(), 
         PdfInfo.getSTART_PAGE(),PdfInfo.getEND_PAGE(),PdfInfo.getOUTPUT_PREFIX(),PdfInfo.getIMAGETYPE(),PdfInfo.getRESOLUTION()); 
       if (!success) { 
        System.exit(1); 
       } 
       document.close(); 

     } catch (IOException | CryptographyException | InvalidPasswordException ex) { 
      Logger.getLogger(PdfToImae.class.getName()).log(Level.SEVERE, null, ex); 
     } 
public class PdfInfo { 
    private static String PDFWAY;  
    private static String OUTPUT_PREFIX; 
    private static String PASSWORD; 
    private static int START_PAGE=1; 
    private static int END_PAGE=Integer.MAX_VALUE; 
    private static String IMAGE_FORMAT="jpg"; 
    private static String COLOR="rgb"; 
    private static int RESOLUTION=256; 
    private static int IMAGETYPE=24; 
    private static String filename; 
    private static String filePath=""; 
} 
0

添加Maven的依赖:

<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox --> 
    <dependency> 
     <groupId>org.apache.pdfbox</groupId> 
     <artifactId>pdfbox</artifactId> 
     <version>2.0.1</version> 
    </dependency> 

而且,CONVER一个PDF格式的图像:

import org.apache.pdfbox.pdmodel.PDDocument; 
import org.apache.pdfbox.rendering.PDFRenderer; 
import javax.imageio.ImageIO; 

private List<String> savePDF(String filePath) throws IOException { 
    List<String> result = Lists.newArrayList(); 

    File file = new File(filePath); 

    PDDocument doc = PDDocument.load(file); 
    PDFRenderer renderer = new PDFRenderer(doc); 

    int pageSize = doc.getNumberOfPages(); 
    for (int i = 0; i < pageSize; i++) { 
     String pngFileName = file.getPath() + "." + (i + 1) + ".png"; 

     FileOutputStream out = new FileOutputStream(pngFileName); 
     ImageIO.write(renderer.renderImageWithDPI(i, 96), "png", out); 
     out.close(); 

     result.add(pngFileName); 
    } 
    doc.close(); 
    return result; 
} 

编辑:

import org.apache.pdfbox.pdmodel.PDDocument; 
import org.apache.pdfbox.rendering.PDFRenderer; 
import javax.imageio.ImageIO; 

private List<String> savePDF(String filePath) throws IOException { 
    List<String> result = Lists.newArrayList(); 

    File file = new File(filePath); 

    PDDocument doc = PDDocument.load(file); 
    PDFRenderer renderer = new PDFRenderer(doc); 

    int pageSize = doc.getNumberOfPages(); 
    for (int i = 0; i < pageSize; i++) { 
     String pngFileName = file.getPath() + "." + (i + 1) + ".png"; 

     ByteArrayOutputStream out = new ByteArrayOutputStream(pngFileName); 
     ImageIO.write(renderer.renderImageWithDPI(i, 96), "png", out); 

     out.toByteArray(); // here you can get a byte array 

     out.close(); 

     result.add(pngFileName); 
    } 
    doc.close(); 
    return result; 
} 
+0

OP要求让pdfbox直接将pdf呈现给'byte []'而不是文件。另一方面,你的回答只能显示将其呈现给文件的另一种方式。 – mkl 2016-12-27 07:11:59

+0

将FileOutputStream替换为ByteArrayOutputStream – BeeNoisy 2016-12-27 09:12:27

+0

'“ByteArrayOutputStream out = new ByteArrayOutputStream(pngFileName)”''ByteArrayOutputStream'只有两个构造函数,一个不带参数,另一个带int参数。因此,使用'String'参数的调用甚至不会编译,除非您的意思是不同于'java.io'中的'ByteArrayOutputStream'。 – mkl 2016-12-29 20:53:28