2016-01-23 124 views
-1

我是OpenCV的初学者,我需要删除图像中的水平和垂直线条,以便只保留文本(这些行在提取ocr文本时会造成麻烦)。我正在尝试从营养成分表中提取文本。谁能帮我?从图像中删除行

Nutrient Fact Table

+0

而不是将线看作是一个“障碍物”,你有没有尝试将它们当作轮廓或使用边缘检测器来传递线条形成的矩形内容?例如。 “营养信息...”将是一个盒子,而宏观营养素分解将是另一个盒子 –

+0

@TrésDuBiel是的,我试过了,但一些营养素事实表中有营养素和它的价值之间的垂直线,如脂肪| 2.7g,创建障碍物之间的垂直线 –

+0

对于线条检测,您可以使用[hough lines](http://docs.opencv.org/2.4/doc/tutorials/imgproc/imgtrans/hough_lines/hough_lines.html) OpenCV的。 – seleciii44

回答

2

这是一个有趣的问题,所以我给它一个镜头。下面我会告诉你如何提取和删除水平和垂直线。你可以从中推断出来。另外,为了节省时间,我没有预处理图像,以便像应该那样突出背景,这是一个改进的途径。

其结果是:result 代码(编辑:附加的垂直线):

#include <iostream> 
#include <opencv2/opencv.hpp> 
using namespace std; 
using namespace cv; 
int main(int, char** argv) 
{ 
    // Load the image 
    Mat src = imread(argv[1]); 
    // Check if image is loaded fine 
    if(!src.data) 
     cerr << "Problem loading image!!!" << endl; 
    Mat gray; 
    if (src.channels() == 3) 
    { 
     cvtColor(src, gray, CV_BGR2GRAY); 
    } 
    else 
    { 
     gray = src; 
    } 

    //inverse binary img 
    Mat bw; 
    //this will hold the result, image to be passed to OCR 
    Mat fin; 
    //I find OTSU binarization best for text. 
    //Would perform better if background had been cropped out 
    threshold(gray, bw, 0, 255, THRESH_BINARY_INV | THRESH_OTSU); 
    threshold(gray, fin, 0, 255, THRESH_BINARY | THRESH_OTSU); 
    imshow("binary", bw); 
    Mat dst; 
    Canny(fin, dst, 50, 200, 3); 
    Mat str = getStructuringElement(MORPH_RECT, Size(3,3)); 
    dilate(dst, dst, str, Point(-1, -1), 3); 
    imshow("dilated_canny", dst); 
    //bitwise_and w/ canny image helps w/ background noise 
    bitwise_and(bw, dst, dst); 
    imshow("and", dst); 
    Mat horizontal = dst.clone(); 
    Mat vertical = dst.clone(); 
    fin = ~dst; 

    //Image that will be horizontal lines 
    Mat horizontal = bw.clone(); 
    //Selected this value arbitrarily 
    int horizontalsize = horizontal.cols/30; 
    Mat horizontalStructure = getStructuringElement(MORPH_RECT, Size(horizontalsize,1)); 
    erode(horizontal, horizontal, horizontalStructure, Point(-1, -1)); 
    dilate(horizontal, horizontal, horizontalStructure, Point(-1, -1), 1); 
    imshow("horizontal_lines", horizontal); 

    //Need to find horizontal contours, so as to not damage letters 
    vector<Vec4i> hierarchy; 
    vector<vector<Point> >contours; 
    findContours(horizontal, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_NONE); 
    for (const auto& c : contours) 
    { 
     Rect r = boundingRect(c); 

     float percentage_height = (float)r.height/(float)src.rows; 
     float percentage_width = (float)r.width/(float)src.cols; 

     //These exclude contours that probably are not dividing lines 
     if (percentage_height > 0.05) 
      continue; 

     if (percentage_width < 0.50) 
      continue; 
     //fills in line with white rectange 
     rectangle(fin, r, Scalar(255,255,255), CV_FILLED); 
    } 

    int verticalsize = vertical.rows/30; 
    Mat verticalStructure = getStructuringElement(MORPH_RECT, Size(1,verticalsize)); 
    erode(vertical, vertical, verticalStructure, Point(-1, -1)); 
    dilate(vertical, vertical, verticalStructure, Point(-1, -1), 1); 
    imshow("verticalal", vertical); 

    findContours(vertical, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_NONE); 
    for (const auto& c : contours) 
    { 
     Rect r = boundingRect(c); 

     float percentage_height = (float)r.height/(float)src.rows; 
     float percentage_width = (float)r.width/(float)src.cols; 

     //These exclude contours that probably are not dividing lines 
     if (percentage_width > 0.05) 
      continue; 

     if (percentage_height < 0.50) 
      continue; 
     //fills in line with white rectange 
     rectangle(fin, r, Scalar(255,255,255), CV_FILLED); 
    } 

    imshow("Result", fin); 
    waitKey(0); 
    return 0; 
} 

这种方法的局限性是,该线需要是直的。由于底线的曲线,它在“能量”中略微削减为“E”。也许像建议的那样(我从来没有使用过)检测到hough线,可以设计出类似但更稳健的方法。另外,用矩形填充线条可能不是最好的方法。