2010-11-19 50 views
7

使用itextsharp(或任何c#pdf库),我需要打开PDF,用实际值替换一些占位符文本,并将其作为byte []返回。使用itextsharp(或任何c#pdf库),如何打开PDF,替换一些文本,并再次保存?

有人可以建议如何做到这一点?我查看了itext文档,无法确定从哪里开始。到目前为止,我一直在坚持如何从pdfReader获取源PDF到Document对象,我推测我可能接近这个错误的方式。

非常感谢

+0

发现此为止:http://www.johnnycode.com/blog/2010/03/05/using-a-template-to-programmatically-create-pdfs-with-c -and-itextsharp/ – Chris 2010-11-19 02:24:19

回答

5

最后,我用PDFescape打开我现有的PDF文件,并将其放置在我需要把我的田某种形式的字段,然后再次将其保存到我的创建PDF文件。

http://www.pdfescape.com

后来我发现关于如何更换表单字段此博客条目:

http://www.johnnycode.com/blog/2010/03/05/using-a-template-to-programmatically-create-pdfs-with-c-and-itextsharp/

所有作品很好!以下是代码:

public static byte[] Generate() 
{ 
    var templatePath = HttpContext.Current.Server.MapPath("~/my_template.pdf"); 

    // Based on: 
    // http://www.johnnycode.com/blog/2010/03/05/using-a-template-to-programmatically-create-pdfs-with-c-and-itextsharp/ 
    var reader = new PdfReader(templatePath); 
    var outStream = new MemoryStream(); 
    var stamper = new PdfStamper(reader, outStream); 

    var form = stamper.AcroFields; 
    var fieldKeys = form.Fields.Keys; 

    foreach (string fieldKey in fieldKeys) 
    { 
    if (form.GetField(fieldKey) == "MyTemplatesOriginalTextFieldA") 
     form.SetField(fieldKey, "1234"); 
    if (form.GetField(fieldKey) == "MyTemplatesOriginalTextFieldB") 
     form.SetField(fieldKey, "5678"); 
    } 

    // "Flatten" the form so it wont be editable/usable anymore 
    stamper.FormFlattening = true; 

    stamper.Close(); 
    reader.Close(); 

    return outStream.ToArray(); 
} 
+0

我不认为你需要使用字段键 - 你可以使用: form.SetField(“MyTemplatesOriginalTextFieldA”,“1234”); 等。 – Lachlan 2010-12-13 02:30:44

+0

啊,是的,那将是我现在这样做的方式。 – Chris 2010-12-13 23:51:06

+0

当我编写代码时,就是这样,因为我正在替换(未命名)字段中的实际值,而不是给出字段并对它们进行命名,因为正确地建议是更好的选择。 – Chris 2010-12-13 23:51:50

1

不幸的是我一直在寻找类似的东西,不能弄明白。下面是我得到的,也许你可以用它作为一个起点。问题是PDF实际上并不保存文本,而是使用查找表和其他一些神秘的魔法。这个方法读取页面的字节值并尝试转换为字符串,但据我所知它只能做英文而错过了一些特殊字符,所以我放弃了我的项目并继续前进。

string contents = string.Empty(); 
Document doc = new Document(); 
PdfReader reader = new PdfReader("pathToPdf.pdf"); 
using (MemoryStream memoryStream = new MemoryStream()) 
{ 

    PdfWriter writer = PdfWriter.GetInstance(doc, memoryStream); 
    doc.Open(); 
    PdfContentByte cb = writer.DirectContent; 
    for (int p = 1; p <= reader.NumberOfPages; p++) 
    { 
     // add page from reader 
     doc.SetPageSize(reader.GetPageSize(p)); 
     doc.NewPage(); 

     // pickup here something like this: 
     byte[] bt = reader.GetPageContent(p); 
     contents = ExtractTextFromPDFBytes(bt); 

     if (contents.IndexOf("something")!=-1) 
     { 
      // make your own pdf page and add to cb (contentbyte) 

     } 
     else 
     { 
      PdfImportedPage page = writer.GetImportedPage(reader, p); 
      int rot = reader.GetPageRotation(p); 
      if (rot == 90 || rot == 270) 
       cb.AddTemplate(page, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(p).Height); 
      else 
       cb.AddTemplate(page, 1.0F, 0, 0, 1.0F, 0, 0); 
     } 
    } 
    reader.Close(); 
    doc.Close(); 
    File.WriteAllBytes("pathToOutputOrSamePathToOverwrite.pdf", memoryStream.ToArray()); 

这取自this site

private string ExtractTextFromPDFBytes(byte[] input) 
{ 
    if (input == null || input.Length == 0) return ""; 

    try 
    { 
     string resultString = ""; 

     // Flag showing if we are we currently inside a text object 
     bool inTextObject = false; 

     // Flag showing if the next character is literal 
     // e.g. '\\' to get a '\' character or '\(' to get '(' 
     bool nextLiteral = false; 

     //() Bracket nesting level. Text appears inside() 
     int bracketDepth = 0; 

     // Keep previous chars to get extract numbers etc.: 
     char[] previousCharacters = new char[_numberOfCharsToKeep]; 
     for (int j = 0; j < _numberOfCharsToKeep; j++) previousCharacters[j] = ' '; 


      for (int i = 0; i < input.Length; i++) 
      { 
       char c = (char)input[i]; 

       if (inTextObject) 
       { 
        // Position the text 
        if (bracketDepth == 0) 
        { 
         if (CheckToken(new string[] { "TD", "Td" }, previousCharacters)) 
         { 
          resultString += "\n\r"; 
         } 
         else 
         { 
          if (CheckToken(new string[] { "'", "T*", "\"" }, previousCharacters)) 
          { 
           resultString += "\n"; 
          } 
          else 
          { 
           if (CheckToken(new string[] { "Tj" }, previousCharacters)) 
           { 
            resultString += " "; 
           } 
          } 
         } 
        } 

        // End of a text object, also go to a new line. 
        if (bracketDepth == 0 && 
         CheckToken(new string[] { "ET" }, previousCharacters)) 
        { 

         inTextObject = false; 
         resultString += " "; 
        } 
        else 
        { 
         // Start outputting text 
         if ((c == '(') && (bracketDepth == 0) && (!nextLiteral)) 
         { 
          bracketDepth = 1; 
         } 
         else 
         { 
          // Stop outputting text 
          if ((c == ')') && (bracketDepth == 1) && (!nextLiteral)) 
          { 
           bracketDepth = 0; 
          } 
          else 
          { 
           // Just a normal text character: 
           if (bracketDepth == 1) 
           { 
            // Only print out next character no matter what. 
            // Do not interpret. 
            if (c == '\\' && !nextLiteral) 
            { 
             nextLiteral = true; 
            } 
            else 
            { 
             if (((c >= ' ') && (c <= '~')) || 
              ((c >= 128) && (c < 255))) 
             { 
              resultString += c.ToString(); 
             } 

             nextLiteral = false; 
            } 
           } 
          } 
         } 
        } 
       } 

       // Store the recent characters for 
       // when we have to go back for a checking 
       for (int j = 0; j < _numberOfCharsToKeep - 1; j++) 
       { 
        previousCharacters[j] = previousCharacters[j + 1]; 
       } 
       previousCharacters[_numberOfCharsToKeep - 1] = c; 

       // Start of a text object 
       if (!inTextObject && CheckToken(new string[] { "BT" }, previousCharacters)) 
       { 
        inTextObject = true; 
       } 
      } 
     return resultString; 
    } 
    catch 
    { 
     return ""; 
    } 
} 

private bool CheckToken(string[] tokens, char[] recent) 
{ 
    foreach (string token in tokens) 
    { 
     if ((recent[_numberOfCharsToKeep - 3] == token[0]) && 
      (recent[_numberOfCharsToKeep - 2] == token[1]) && 
      ((recent[_numberOfCharsToKeep - 1] == ' ') || 
      (recent[_numberOfCharsToKeep - 1] == 0x0d) || 
      (recent[_numberOfCharsToKeep - 1] == 0x0a)) && 
      ((recent[_numberOfCharsToKeep - 4] == ' ') || 
      (recent[_numberOfCharsToKeep - 4] == 0x0d) || 
      (recent[_numberOfCharsToKeep - 4] == 0x0a))) 
      { 
       return true; 
      } 
    } 
    return false; 
} 
+0

什么是_numberOfCharsToKeep缺少声明this.so指导我如何定义此。 – 2013-09-13 04:57:18

相关问题