使用iTextSharp的,我有以下的代码,成功地翻出了PDF文本为广大PDF的我想读的......PdfTextExtractor.GetTextFromPage没有返回正确的文本
PdfReader reader = new PdfReader(fileName);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
text += PdfTextExtractor.GetTextFromPage(reader, i);
}
reader.Close();
然而,我的一些PDF格式的有XFA表单(已经被填写),这将导致“文本”字段来填充下面的垃圾......
"Please wait... \n \nIf this message is not eventually replaced by the proper contents of the document, your PDF \nviewer may not be able to display this type of document. \n \nYou can upgrade to the latest version of Adobe Reader for Windows®, Mac, or Linux® by \nvisiting http://www.adobe.com/products/acrobat/readstep2.html. \n \nFor more assistance with Adobe Reader visit http://www.adobe.com/support/products/\nacrreader.html. \n \nWindows is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries. Mac is a trademark \nof Apple Inc., registered in the United States and other countries. Linux is the registered trademark of Linus Torvalds in the U.S. and other \ncountries."
我如何解决此问题?我尝试使用iTextSharp的PdfStamper [1]来压扁PDF,但这不起作用 - 生成的流具有相同的垃圾文本。
[1] How to flatten already filled out PDF form using iTextSharp