2011-11-23 58 views
0

我成功地能够使用pdfsharp从pdf中提取图像。该图像是CCITFFaxDecode。但在创建的tiff图像中,图像正在旋转。任何想法可能会出错?从pdfshasha提取的旋转图像

这是使用代码IM:

byte[] data = xObject.Stream.Value; 
Tiff tiff = BitMiracle.LibTiff.Classic.Tiff.Open("D:\\clip_TIFF.tif", "w"); 
tiff.SetField(TiffTag.IMAGEWIDTH, (uint)(width)); 
tiff.SetField(TiffTag.IMAGELENGTH, (uint)(height)); 
tiff.SetField(TiffTag.COMPRESSION, (uint)BitMiracle.LibTiff.Classic.Compression.CCITTFAX4); 
tiff.SetField(TiffTag.BITSPERSAMPLE, (uint)(bpp)); 
tiff.WriteRawStrip(0,data,data.Length); 
tiff.Close(); 
+0

没有PDF,没有TIFF,提取代码 - 我们怎么知道出了什么问题?也许图像是用旋转变换在PDF中绘制的?或PDF页面旋转?也许没有什么问题,一切都是按设计进行的。 –

+0

噢,你的意思是如果图像已经在PDF上绘制了旋转变换,那么提取出来的图像也将被旋转?图像的旋转是否需要用pdf和tiff图像的坐标系进行操作? – Mugdha

回答

0

由于问题仍然是标记瓦特/ iTextSharp的还不如添加一些代码,即使它并不像你在这里使用的库。从iText开始添加PDF解析支持[Sharp] 5.

没有使用您正在使用的图像类型的测试PDF,但found one here(请参阅附件)。下面是使用测试PDF文档中ASP.NETHTTP handler ashx的)一个非常简单工作例子让你去:

<%@ WebHandler Language="C#" Class="CCITTFaxDecodeExtract" %> 
using System; 
using System.Collections.Generic; 
using System.IO; 
using System.Web; 
using iTextSharp.text; 
using iTextSharp.text.pdf; 
using iTextSharp.text.pdf.parser; 
using Dotnet = System.Drawing.Image; 
using System.Drawing.Imaging; 

public class CCITTFaxDecodeExtract : IHttpHandler { 
    public void ProcessRequest (HttpContext context) { 
    HttpServerUtility Server = context.Server; 
    HttpResponse Response = context.Response; 
    string file = Server.MapPath("~/app_data/CCITTFaxDecode.pdf"); 
    PdfReader reader = new PdfReader(file); 
    PdfReaderContentParser parser = new PdfReaderContentParser(reader); 
    MyImageRenderListener listener = new MyImageRenderListener(); 
    for (int i = 1; i <= reader.NumberOfPages; i++) { 
     parser.ProcessContent(i, listener); 
    } 
    for (int i = 0; i < listener.Images.Count; ++i) { 
     string path = Server.MapPath("~/app_data/" + listener.ImageNames[i]); 
     using (FileStream fs = new FileStream(
     path, FileMode.Create, FileAccess.Write 
    )) 
     { 
     fs.Write(listener.Images[i], 0, listener.Images[i].Length); 
     } 
    }   
    } 
    public bool IsReusable { get { return false; } } 
/* 
* see: TextRenderInfo & RenderListener classes here: 
* http://api.itextpdf.com/itext/ 
* 
* and Google "itextsharp extract images" 
*/ 
    public class MyImageRenderListener : IRenderListener { 
    public void RenderText(TextRenderInfo renderInfo) { } 
    public void BeginTextBlock() { } 
    public void EndTextBlock() { } 

    public List<byte[]> Images = new List<byte[]>(); 
    public List<string> ImageNames = new List<string>(); 
    public void RenderImage(ImageRenderInfo renderInfo) { 
     PdfImageObject image = renderInfo.GetImage(); 
     PdfName filter = image.Get(PdfName.FILTER) as PdfName; 
     if (filter == null) { 
     PdfArray pa = (PdfArray) image.Get(PdfName.FILTER); 
     for (int i = 0; i < pa.Size; ++i) { 
      filter = (PdfName) pa[i]; 
     } 
     } 
     if (PdfName.CCITTFAXDECODE.Equals(filter)) { 
     using (Dotnet dotnetImg = image.GetDrawingImage()) { 
      if (dotnetImg != null) { 
      ImageNames.Add(string.Format(
       "{0}.tiff", renderInfo.GetRef().Number) 
      ); 
      using (MemoryStream ms = new MemoryStream()) { 
       dotnetImg.Save(
       ms, ImageFormat.Tiff); 
       Images.Add(ms.ToArray()); 
      } 
      } 
     } 
     } 
    } 
    } 
} 

如果图像(S)为/正在旋转,see this thread on the iText mailing list;也许PDF文档中的一些页面已被旋转。

0

通过由这是被从PDF提取图像的完整代码,但其旋转。对于代码的长度抱歉。

PdfDocument document = PdfReader.Open("D:\\Sample.pdf"); 
PdfDictionary resources =document.pages.Elements.GetDictionary("/Resources"); 
PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject"); 
if (xObjects != null) 
{ 
    ICollection<PdfItem> items = xObjects.Elements.Values; 
    // Iterate references to external objects 
    foreach (PdfItem item in items) 
    { 
     PdfReference reference = item as PdfReference; 
     if (reference != null) 
     { 
      PdfDictionary xObject = reference.Value as PdfDictionary; 
      // Is external object an image? 

      if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image") 
      { 
       string filter = xObject.Elements.GetName("/Filter"); 

       if (filter.Equals("/CCITTFaxDecode")) 
       { 
        int width = xObject.Elements.GetInteger(PdfImage.Keys.Width); 
        int height = xObject.Elements.GetInteger(PdfImage.Keys.Height); 
        int bpp = xObject.Elements.GetInteger(PdfImage.Keys.BitsPerComponent); 

        byte[] data = xObject.Stream.Value; 
        Tiff tiff = BitMiracle.LibTiff.Classic.Tiff.Open("D:\\sample.tif", "w"); 
        tiff.SetField(TiffTag.IMAGEWIDTH, (uint)(width)); 
        tiff.SetField(TiffTag.IMAGELENGTH, (uint)(height)); 
        tiff.SetField(TiffTag.COMPRESSION, (uint)BitMiracle.LibTiff.Classic.Compression.CCITTFAX4); 
        tiff.SetField(TiffTag.BITSPERSAMPLE, (uint)(bpp)); 
        tiff.SetField(TiffTag.STRIPOFFSETS, 187); 

        tiff.WriteRawStrip(0,data,data.Length); 
        tiff.Close(); 
       } 
      } 
     } 
    } 
}