利用百度ocr识别验证码

259 阅读 0 评论 171 点赞

我是靠谱客的博主天真胡萝卜，这篇文章主要介绍利用百度ocr识别验证码，现在分享给大家，希望可以做个参考。

前言： tesserocr是很早的一款OCR文字识别技术，就现在来说可能有点老。

CODE：

from aip import AipOcr
from PIL import Image
import codecs


# 读取图片函数，注意client.general方法只能识别这种类型的数据，不可以直接用Image方法进行读取，否则会报错
def ocr(path):
    with open(path,'rb') as f:
        return f.read()


def main():
    print("已经收到，正在处理，请稍后....")
    # 百度ocr使用的id及密码
    app_id = '******'
    api_key = '*******************'
    secret_key = '******************************'
    client = AipOcr(app_id,api_key,secret_key)
    # 读取图片
    image = Image.open('0.jpg')
    # 将图片转化为灰度图像
    image = image.convert('L')
    # 设置默认的阈值，具体的什么我也不懂，好像和二值化相关(可以根据阈值得到更加清晰的验证码图)
    threshold = 128
    table = []
    for i in range(256):
        if i < threshold:
            table.append(0)
        else:
            table.append(1)
    # 图片的像素点什么的
    image = image.point(table,'1')
    # 可以将图片本地打开
    image.show(image)
    # 将进行操作后的图片保存成指定的格式
    image.save("code.png",'png')
    # 读取PIL处理后保存图片函数
    image = ocr('code.png')
    # 处理的是函数返回的，(as f: 什么返回的数据)
    dict1 = client.general(image)
    # 讲得到的结果值打印，这里是字典格式的数据
    print(dict1)

if __name__ == '__main__':
    main()

Result：