2015-05-14 65 views
0

我已经按照教程来获取Tesseract,特别是苔丝二和眼睛二安装和我的Android应用程序的一部分。tess-two OCR没有正确解码

它运行,但从 baseApi.getUTF8Text();返回的OCR文本是完整的乱码。

BitmapFactory.Options options = new BitmapFactory.Options(); 
     options.inSampleSize = 4; 
     Bitmap bmp = BitmapFactory.decodeFile(path , options); 
     receipt.setImageBitmap(bmp); 

     try { 
      ExifInterface exif = new ExifInterface(path); 
      int exifOrientation = exif.getAttributeInt(ExifInterface.TAG_ORIENTATION , ExifInterface.ORIENTATION_NORMAL); 
      int rotate = 0; 
      switch (exifOrientation) { 
       case ExifInterface.ORIENTATION_ROTATE_90: rotate = 90; break; 
       case ExifInterface.ORIENTATION_ROTATE_180: rotate = 180; break; 
       case ExifInterface.ORIENTATION_ROTATE_270: rotate = 270; break; 
      } 
      if (rotate != 0) { 
       int w = bmp.getWidth(); 
       int h = bmp.getHeight(); 
       Matrix matrix = new Matrix(); 
       matrix.preRotate(rotate); 
       bmp = Bitmap.createBitmap(bmp, 0, 0, w, h, matrix, false); 
      } 

      bmp = bmp.copy(Bitmap.Config.ARGB_8888, true); 


      TessBaseAPI baseApi = new TessBaseAPI(); 
      baseApi.init(DATA_PATH , "eng"); 
      baseApi.setImage(bmp); 
      String OCRText = baseApi.getUTF8Text(); 
      baseApi.end(); 

      Log.i("OCR Text", "rotate " + rotate); 
      Log.i("OCR Text", "OCR "); 
      Log.i("OCR Text", OCRText); 
      Log.i("OCR Text", "======================================================================================="); 

拍摄具有OCR字符 返回

05-14 11:01:59.131: I/OCR Text(18199): rotate 90 
05-14 11:01:59.131: I/OCR Text(18199): OCR 
05-14 11:01:59.131: I/OCR Text(18199): 4— ‘ ‘ 
05-14 11:01:59.131: I/OCR Text(18199): \Dxfi ‘ 
05-14 11:01:59.131: I/OCR Text(18199): I W man"! no Accounv 
05-14 11:01:59.131: I/OCR Text(18199): 1’ 
05-14 11:01:59.131: I/OCR Text(18199): my... «unblm m. mm. 
05-14 11:01:59.131: I/OCR Text(18199): :~A 
05-14 11:01:59.131: I/OCR Text(18199): «Ln. 
05-14 11:01:59.131: I/OCR Text(18199): ‘ “w “IN. N I “H‘M‘ 
05-14 11:01:59.131: I/OCR Text(18199): mmnwnmw- .; k. ' 
05-14 11:01:59.131: I/OCR Text(18199): Wilt-run”. uni” nl 
05-14 11:01:59.131: I/OCR Text(18199): mam. I 
05-14 11:01:59.131: I/OCR Text(18199): ======================================================================================= 

如何清理和纠正OCR识别任何意见支票?使用 设备是三星Galaxy 7"

+0

三星Galaxy Tab 2 7" 没有按在主摄像头(后置)上没有自动对焦功能,所以在使用不同设备之后,您不可能获得更好的效果。 – rmtheis

回答

0

您可以使用类似

OCRText = OCRText.replaceAll("[^a-zA-Z0-9]+", " "); 
OCRText = OCRText.trim(); 

它是基于一个正方体实现我发现这里:SimpleAndroidOCRActivity.java

+2

谢谢。但我相信这可能与焦点有关。如果我使用前置摄像头(具有自动对焦)进行扫描,则准确度达到90%更有意义。当我使用后置摄像头进行扫描时(它没有自动对焦),这是上面的乱码。这应该是一个名字和地址。 – NewDev