识别OCR 中特殊字符，如：五角星，桃心（图片）

Error：could not find a version that satisfies the requirement pytesseract

pytesseraot.pytesseraot.TesseractlotFoundError: tesseact is not installed or it's not in you PATH.See READNE file for mowe infonmEation

文章目录

识别OCR 中特殊字符，如：五角星，桃心（图片）
前言
一、解决方案--pytesseract
解决方案-Tesseract
- - 异常原因：
代码演示
- 提取图片文字

前言

一、解决方案–pytesseract

==ModuleNotFoundError: No module named ‘pytesseract‘=

pip install pytesseract无数次失败

通过网页打开---->>Links for pytesseract (pypi.org)

如果下载不了，接下来点击F12 从元素下载

在这里插入图片描述

下载pytesseract-0.3.7.tar.gz
放到 “python安装路径”\Lib\site-packages\ 解压缩
进入pytesseract文件夹，里面有setup.py
在此处运行cmd，输入命令：
python setup.py install

解决方案-Tesseract

Tesseract是一个开源的OCR（Optical Character Recognition，光学字符识别）引擎

pytesseraot.pytesseraot.TesseractlotFoundError: tesseact is not installed or it's not in you PATH.See READNE file for mowe infonmEation.

异常原因：

仅仅是通过PIP工具安装了pytesseract库，并没有安装第三方OCR识别工具包，需要下载安装并进行环境配置
在这里插入图片描述
检测版本
检查版本，进入CMD，输入tesseract --version

C:\Users\Administrator>tesseract --version
tesseract v5.0.1.20220107
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX512BW
 Found AVX512F
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5
 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0

问题解决

代码演示

提取图片文字

from paddleocr import PaddleOCR, draw_ocr
from PIL import Image

# load model
# Paddleocr目前支持中英文、英文、法语、德语、韩语、日语，可以通过修改 lang参数进行切换
# lang参数依次为`ch`, `en`, `french`, `german`, `korean`, `japan`
ocr = PaddleOCR(lang="ch",
                use_gpu=False,
                det_model_dir="../../paddleORC_model/ch_ppocr_server_v2.0_det_infer/",
                cls_model_dir="../ch_ppocr_mobile_v2.0_cls_infer/",
                rec_model_dir="../ch_ppocr_server_v2.0_rec_infer/")

# load dataset
img_path = 'image2.png'
result = ocr.ocr(img_path)
for line in result:
    print(line)

问题：桃心提取不到
在这里插入图片描述
解决方案：

from pytesseract.build.lib.pytesseract  import pytesseract
from PIL import Image

# 读取图片
img = Image.open('image2.png')
pytesseract.tesseract_cmd='D:\\Nlp_Room\Tesseract-OCR\\tesseract.exe'
# 使用pytesseract识别图片中的文本
text = pytesseract.image_to_string(img)

# 输出识别结果
print(text)

可通过末尾计数
在这里插入图片描述

【pytesseract 识别】