Tesseract OCR recognition of small numbers



  • Tesseract OCR doesn't recognize small numbers, i.e., 6 and 9, others recognize as necessary.

    Exhibit:

    введите сюда описание изображения

    pytesseract.pytesseract.tesseract_cmd = "C:\\Program Files\\Tesseract-OCR\\tesseract.exe"
    

    img = cv2.imread('src_path...')

    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    scale_percent = 400 # percent of original size
    width = int(img_gray.shape[1] * scale_percent / 100)
    height = int(img_gray.shape[0] * scale_percent / 100)
    dim = (width, height)
    resized_img = cv2.resize(img_gray, dim, interpolation=cv2.INTER_AREA)

    blur_img = cv2.GaussianBlur(resized_img, (3, 3), 0)
    blur_img = cv2.medianBlur(blur_img, 3)

    thresh, new_img = cv2.threshold(blur_img, 0, 255, cv2.THRESH_OTSU |cv2.THRESH_BINARY)

    custom_config = '--psm 12 --oem 3 -c tessedit_char_whitelist=0123456789'
    digits = pytesseract.image_to_string(new_img, lang='eng', config=custom_config)
    print(digits)

    After all the changes, this is the image:

    введите сюда описание изображения

    But that tesseract doesn't recognize what you can do?

    upd 1: Temporary solution, 10 times the image and a strong blur.medianBlur between 13 and 21). ♪ ♪ tesseract I've decided. But is there a better solution to my problem?



  • Found the reference image:

    Invertify the colors, the thickness of the teseract is more of a black print on the white background and put psm 8

    Possible options psm:

    • 0 = Orientation and script detection (OSD) only.
    • 1 = Automatic page segmentation with OSD.
    • 2 = Automatic page segmentation, but no OSD, or OCR. (not implemented)
    • 3 = Fully automatic page segmentation, but no OSD. (Default)
    • 4 = Assume a single column of text of variable sizes.
    • 5 = Assume a single uniform block of vertically aligned text.
    • 6 = Assume a single uniform block of text.
    • 7 = Treat the image as a single text line.
    • 8 = Treat the image as a single word.
    • 9 = Treat the image as a single word in a circle.
    • 10 = Treat the image as a single character.
    • 11 = Sparse text. Find as much text as possible in no particular order.
    • 12 = Sparse text with OSD.
    • 13 = Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific.

    oem:

    • 0 = Original Tesseract only.
    • 1 = Neural nets LSTM only.
    • 2 = Tesseract + LSTM.
    • 3 = Default, based on what is available.

    The descriptions of the regimes are taken https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc

    import pytesseract
    import cv2
    

    img = cv2.imread('ZZt0xKV.png')
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    scale_percent = 400 # percent of original size
    width = int(img_gray.shape[1] * scale_percent / 100)
    height = int(img_gray.shape[0] * scale_percent / 100)
    dim = (width, height)
    resized_img = cv2.resize(img_gray, dim, interpolation=cv2.INTER_AREA)

    blur_img = cv2.GaussianBlur(resized_img, (3, 3), 0)
    blur_img = cv2.medianBlur(blur_img, 3)

    thresh, new_img = cv2.threshold(blur_img, 0, 255, cv2.THRESH_OTSU |cv2.THRESH_BINARY)

    pytesseract.pytesseract.tesseract_cmd = "C:\Program Files\Tesseract-OCR\tesseract.exe"

    new_img = cv2.bitwise_not(new_img)

    custom_config = '--psm 8 -c tessedit_char_whitelist=0123456789'
    digits = pytesseract.image_to_string(new_img, lang='eng', config=custom_config)
    print(digits.strip()) # 9

    Useful https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html with Boards to improve the quality of recognition

    upd: Got the conversion code from question, changed psm



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2