OCR（文字识别）功能与ASR（语音识别）的java应用开发（基于百度智能云）

181 阅读 0 评论 120 点赞

我是靠谱客的博主超帅镜子，这篇文章主要介绍OCR（文字识别）功能与ASR（语音识别）的java应用开发（基于百度智能云），现在分享给大家，希望可以做个参考。

百度云官网：

百度智能云-智能时代基础设施百度智能云专注云计算、智能大数据、人工智能服务，提供稳定的云服务器、云主机、云存储、CDN、域名注册、物联网等云服务,支持API对接,快速备案等专业解决方案。https://cloud.baidu.com/

一、OCR（文字识别）功能

首先在百度智能云官网注册登录百度云账号，点击管理控制台之后点击文字识别：

点击创建应用，按照要求填写即可，注意在接口选择中选择自己需要的接口，设置完成点击立即创建：

创建成功之后在应用列表可以查看到该应用的AppID、API Key、Secret Key：

这三个参数会在项目里面使用到，用于连接此应用：

java项目写法：

public class GeneralRecognition {
        //设置APPID/AK/SK
        public static final String APP_ID = "";
        public static final String API_KEY = "";
        public static final String SECRET_KEY = "";
        private static AipOcr client = null;

        public static void main(String[] args) throws IOException, URISyntaxException {

            File file = new File(chooseFile());
            Desktop desktop = Desktop.getDesktop();
            desktop.open(file);
//            URI uri = new URI("E:\");
//            desktop.browse(uri);
            dis(file.getPath());
        }

        //选择文件进行上传
        public static String chooseFile() {
            FileSystemView fsv = FileSystemView.getFileSystemView();

            JFileChooser fileChooser = new JFileChooser();
            fileChooser.setCurrentDirectory(fsv.getHomeDirectory());
            fileChooser.setDialogTitle("请选择要上传的文件...");
            fileChooser.setApproveButtonText("确定");
            fileChooser.setFileSelectionMode(JFileChooser.FILES_ONLY);

            int result = fileChooser.showOpenDialog(null);

            if (JFileChooser.APPROVE_OPTION == result) {
                String path = fileChooser.getSelectedFile().getPath();
                return path;
            }
            return "没有找到";
        }

        public static void init(){
            // 初始化一个AipOcr
            if(client == null){
                client = new AipOcr(APP_ID, API_KEY, SECRET_KEY);
            }
           
            // 可选：设置网络连接参数
            client.setConnectionTimeoutInMillis(2000);
            client.setSocketTimeoutInMillis(60000);
        }

        //普通文字识别
        public static void dis(String path){
            init();
            // 传入可选参数调用接口
            HashMap<String, String> options = new HashMap<>();
            options.put("language_type", "CHN_ENG");
            options.put("detect_direction", "true");
            options.put("detect_language", "true");
            options.put("probability", "true");

          //参数为本地图片路径
        JSONObject res = client.basicGeneral(path, options);
        System.out.println(res.toString(2));
}

中间在调用其中的接口的时候遇到了一点问题：

[main] INFO com.baidu.aip.client.BaseClient - get access_token success. current state: STATE_AIP_AUTH_OK
{
  "error_msg": "No permission to access data",
  "error_code": 6
}

Process finished with exit code 0

原因是没有方法（API）的使用权限。

类似于这样的错误信息可以在应用的错误信息中查看到：

解决步骤：

1、进入到应用列表，如下图：

2、依次点击管理、编辑，除了此应用默认勾选的接口，然后把其他需要使用的接口勾选上，还可以点击领取免费接口使用权限：

注意：有些接口是需要一些认证的，比如公安验证接口、身份证与名字比对接口就需要进行企业认证，提交企业认证才，认证通过之后还要在您在控制台–人脸识别–离线采集SDK管理处按照流程进行申请才能使用，通过后会自动为您开通接口使用权限，一般2小时自动审批通过。

3、点击保存修改，再次调用，问题解决。

免费领取或者申请开通其他权限或付费审核通过之后就可以使用相关功能的API了，还可以查看相关API的使用情况：

二、ASR（语音识别）功能

步骤和上边的文字识别步骤差不多，都是先在控制台找到文字识别或者语音识别模块，然后在相应功能模块创建应用，创建时或者创建之后注意配置一下接口权限以保证后面能正常调用相应的API，每一个应用有3个重要参数：APP id, API key, SECRET key，将这3个参数配置到项目中即可，下面是asr语音识别项目代码：

public class MandarinRecognition {
        //设置APPID/AK/SK
        public static final String APP_ID = "";
        public static final String API_KEY = "";
        public static final String SECRET_KEY = "";
        private static AipSpeech client = null;

        public static void main(String[] args) throws IOException, URISyntaxException {

            File file = new File(chooseFile());
//            Desktop desktop = Desktop.getDesktop();
//            desktop.open(file);
//            URI uri = new URI("E:\");
//            desktop.browse(uri);
            System.out.println("正在准备输出。。");
            String outPutPath = "template/asrOutput.txt";
            dis(file.getPath(),outPutPath);
        }

        //选择文件进行上传
        public static String chooseFile() {
            FileSystemView fsv = FileSystemView.getFileSystemView();

            JFileChooser fileChooser = new JFileChooser();
            fileChooser.setCurrentDirectory(fsv.getHomeDirectory());
            fileChooser.setDialogTitle("请选择要上传的文件...");
            fileChooser.setApproveButtonText("确定");
            fileChooser.setFileSelectionMode(JFileChooser.FILES_ONLY);

            int result = fileChooser.showOpenDialog(null);

            if (JFileChooser.APPROVE_OPTION == result) {
                String path = fileChooser.getSelectedFile().getPath();
                return path;
            }
            return "没有找到";
        }

        public static void init(){
            // 初始化一个AipSpeech
            if(client == null){
                client = new AipSpeech(APP_ID, API_KEY, SECRET_KEY);
            }
        }

        //普通文字识别
        public static void dis(String imgPath, String outPutPath) throws IOException {
            init();
            // 传入可选参数调用接口
            HashMap<String, Object> options = new HashMap<>();
            options.put("dev_pid",1537);

            //参数为本地图片路径
            System.out.println(imgPath);
            /**
             * 原始 PCM 的音频格式必须符合16k 采样率、16bit 位深、单声道。支持的格式有：pcm（不压缩）、wav（不压缩，pcm编码）、amr（压缩格式）。
             * 最长支持60s的录音文件。对文件大小没有限制，只对时长有限制。
             */
            System.out.println(client.asr(imgPath, "pcm", 16000, options));
        }
    }

输出结果：

正在准备输出。。
E:16k.pcm
[main] INFO com.baidu.aip.client.BaseClient - get access_token success. current state: STATE_AIP_AUTH_OK
{"result":["北京科技馆。"],"err_msg":"success.","sn":"238256483091644572246","corpus_no":"7063384013687529084","err_no":0}

Process finished with exit code 0