2016-11-30 168 views
1

我可以使用Tesseract扫描JPG图像,我可以使用ITextSharp扫描常规PDF并从中获取文本。但是我无法找到一种方法,可以从PDF格式的PDF扩展名中获取文本,也可以将PDF转换为图像,然后使用Tesseract对其进行扫描。有没有我错过的选项?谢谢!将扫描PDF转换为图像

回答

0

假设您已扫描PDF文档。其次,假设您只有PDF文档中的文本。您可以从下面的方法生成的文本图像

private Image DrawText(String text, Font font, Color textColor, Color backColor) 
{ 
    //first, create a dummy bitmap just to get a graphics object 
    Image img = new Bitmap(1, 1); 
    Graphics drawing = Graphics.FromImage(img); 

    //measure the string to see how big the image needs to be 
    SizeF textSize = drawing.MeasureString(text, font); 

    //free up the dummy image and old graphics object 
    img.Dispose(); 
    drawing.Dispose(); 

    //create a new image of the right size 
    img = new Bitmap((int) textSize.Width, (int)textSize.Height); 

    drawing = Graphics.FromImage(img); 

    //paint the background 
    drawing.Clear(backColor); 

    //create a brush for the text 
    Brush textBrush = new SolidBrush(textColor); 

    drawing.DrawString(text, font, textBrush, 0, 0); 

    drawing.Save(); 

    textBrush.Dispose(); 
    drawing.Dispose(); 

    return img; 

} 

参考:How to generate an image from text on fly at runtime