语音识别编程技术

236 阅读 0 评论 156 点赞

我是靠谱客的博主贪玩火龙果，这篇文章主要介绍语音识别编程技术，现在分享给大家，希望可以做个参考。

语音识别编程涉及IspRecognizer，IspRecoContext和ISpRecoGrammar等多个语音识别引擎接口。下面先来设计一个操作语音识别的类CSpeechRecognition，然后基于该类来实现一个实例。

11.3.1 构造CSpeechRecognition类

CSpeechRecognition类封装了语音识别操作所需调用的几个接口，使用它进行语音识别编程很方便，也很简洁。

CSpeechRecognition类的定义如下：

///

// active speech engine

#include <atlbase.h>

extern CComModule _Module;

#include <atlcom.h>

#include <sapi.h>

#include <sphelper.h>

#include <spuihelp.h>

///

// speech message

#define WM_SREVENT WM_USER+102

class CSpeechRecognition

{

public:

CSpeechRecognition();

virtual ~CSpeechRecognition();

// initialize

BOOL Initialize(HWND hWnd = NULL, BOOL bIsShared = TRUE);

void Destroy();

// start and stop

BOOL Start();

BOOL Stop();

BOOL IsDictationOn()

{

return m_bOnDictation;

}

// event handler

void GetText(WCHAR **ppszCoMemText, ULONG ulStart = 0, ULONG nlCount = -1);

// voice training

HRESULT VoiceTraining(HWND hWndParent);

// microphone setup

HRESULT MicrophoneSetup(HWND hWndParent);

// token list

HRESULT InitTokenList(HWND hWnd, BOOL bIsComboBox = FALSE);

// error string

CString GetErrorString()

{

return m_sError;

}

// interface

CComPtr<ISpRecognizer> m_cpRecoEngine; // SR engine

CComPtr<ISpRecoContext> m_cpRecoCtxt; //Recognition contextfor dictation

CComPtr<ISpRecoGrammar> m_cpDictationGrammar; // Dictation grammar

private:

CString m_sError;

BOOL m_bOnDictation;

};

其中定义的消息WM_SREVENT用于指示语音识别事件，该消息将通知到初始化函数指定的响应窗口。

类中定义了3个接口指针m_cpRecoEngine，m_cpRecoCtxt和m_cpDictationGrammar，分别用于引用语音识别引擎的3个重要接口IspRecognizer，ISpRecoContext和IspRecoGrammar。

初始化函数Initialize设定了语音识别引擎的基本工作环境，包括引擎、识别上下文、语法、音频和事件等的初始化：

BOOL CSpeechRecognition::Initialize(HWND hWnd, BOOL bIsShared)

{

// com library

if (FAILED(CoInitialize(NULL)))

{

m_sError=_T("Error intialization COM");

return FALSE;

}

// SR engine

HRESULT hr = S_OK;

if (bIsShared)

{

// Shared reco engine.

// For a shared reco engine, the audio gets setup automatically

hr = m_cpRecoEngine.CoCreateInstance( CLSID_SpSharedRecognizer );

}

else

{

hr = m_cpRecoEngine.CoCreateInstance(CLSID_SpInprocRecognizer);

}

// RecoContext

if( SUCCEEDED( hr ) )

{

hr = m_cpRecoEngine->CreateRecoContext( &m_cpRecoCtxt );

}

// Set recognition notification for dictation

if (SUCCEEDED(hr))

{

hr = m_cpRecoCtxt->SetNotifyWindowMessage( hWnd, WM_SREVENT, 0, 0 );

}

if (SUCCEEDED(hr))

{

// when the engine has recognized something

const ULONGLONG ullInterest = SPFEI(SPEI_RECOGNITION);

hr = m_cpRecoCtxt->SetInterest(ullInterest, ullInterest);

}

// create default audio object

CComPtr<ISpAudio> cpAudio;

hr = SpCreateDefaultObjectFromCategoryId(SPCAT_AUDIOIN, &cpAudio);

// set the input for the engine

hr = m_cpRecoEngine->SetInput(cpAudio, TRUE);

hr = m_cpRecoEngine->SetRecoState( SPRST_ACTIVE );

// grammar

if (SUCCEEDED(hr))

{

// Specifies that the grammar we want is a dictation grammar.

// Initializes the grammar (m_cpDictationGrammar)

hr = m_cpRecoCtxt->CreateGrammar( 0, &m_cpDictationGrammar );

}

if (SUCCEEDED(hr))

{hr = m_cpDictationGrammar->LoadDictation(NULL, SPLO_STATIC);

}

if (SUCCEEDED(hr))

{

hr = m_cpDictationGrammar->SetDictationState( SPRS_ACTIVE );

}

if (FAILED(hr))

{

m_cpDictationGrammar.Release();

}

return (hr == S_OK);

}

释放函数Destroy被类的析构函数调用，释放了类所引用的所有接口：

void CSpeechRecognition::Destroy()

{

if (m_cpDictationGrammar)

m_cpDictationGrammar.Release();

if (m_cpRecoCtxt)

m_cpRecoCtxt.Release();

if (m_cpRecoEngine)

m_cpRecoEngine.Release();

CoUninitialize();

}

函数Start和Stop用来控制开始和停止接受及识别语音，它们通过调用引擎接口的SetRecoState方法来实现：

BOOL CSpeechRecognition::Start()

{

if (m_bOnDictation)

return TRUE;

HRESULT hr = m_cpRecoEngine->SetRecoState( SPRST_ACTIVE );

if (FAILED(hr))

return FALSE;

m_bOnDictation = TRUE;

return TRUE;

}

BOOL CSpeechRecognition::Stop()

{

if (! m_bOnDictation)

return TRUE;

HRESULT hr = m_cpRecoEngine->SetRecoState( SPRST_INACTIVE );

if (FAILED(hr))

return FALSE;

m_bOnDictation = FALSE;

return TRUE;

}

函数GetText是获取从语音中已识别出的文字的关键，应该在响应识别事件/消息的响应函数中调用，其代码如下所示。

void CSpeechRecognition::GetText(WCHAR **ppszCoMemText, ULONG ulStart, ULONG nlCount)

{

USES_CONVERSION;

CSpEvent event;

// Process all of the recognition events

while (event.GetFrom(m_cpRecoCtxt) == S_OK)

{

switch (event.eEventId)

{

case SPEI_RECOGNITION:

// There may be multiple recognition results, so get all of them

{

HRESULT hr = S_OK;

if (nlCount == -1)

event.RecoResult()->GetText(SP_GETWHOLEPHRASE,

SP_GETWHOLEPHRASE, TRUE, ppszCoMemText, NULL);

else

{

ASSERT(nlCount > 0);

event.RecoResult()->GetText(ulStart, nlCount, FALSE,

ppszCoMemText, NULL);

}

break;

}

函数InitTokenList调用SpInitTokenComboBox和SpInitTokenListBox函数来实现语音语言在列表或组合列表中的列表显示和选择：

HRESULT CSpeechRecognition::InitTokenList(HWND hWnd, BOOL bIsComboBox)

{

if (bIsComboBox)

return SpInitTokenComboBox(hWnd, SPCAT_RECOGNIZERS);

else

return SpInitTokenListBox(hWnd, SPCAT_RECOGNIZERS);

}

语音识别涉及语音的输入，通常用话筒来输入语音。进行语音识别前，需要判断话筒的位置和设置是否合理，以保证语音识别引擎能获得有效的语音输入。函数MicrophoneSetup调用语音识别引擎接口的DisplayUI方法来显示一个设置话筒的向导，如图11-4所示。示例代码如下所示：

HRESULT CSpeechRecognition::MicrophoneSetup(HWND hWndParent)

{

return m_cpRecoEngine->DisplayUI(hWndParent, NULL, SPDUI_MicTraining, NULL, 0);

}

语音训练是语音识别的重要基础，为了获得期望的识别效果，必须进行语音训练，以让语音识别引擎熟悉说话者的口音。函数VoiceTraining调用语音识别引擎接口的DisplayUI方法来显示一个语音训练向导，如图11-5所示。示例代码如下所示：

HRESULT CSpeechRecognition::VoiceTraining(HWND hWndParent)

{

return m_cpRecoEngine->DisplayUI(hWndParent, NULL, SPDUI_UserTraining, NULL, 0);

}

与CText2Speech类似，CSpeechRecognition类也提供错误处理机制，由GetErrorString函数可以获得错误信息。

11.3.2 示例：用CSpeechRecognition类编制听写程序

使用CSpeechRecognition类来编写语音识别程序很简单，下面让我们实现一个听写程序Stenotypist，其界面如图11-6所示。

用VisualC++编制Stenotypist的步骤和要点如下：

1）使用AppWizard生成一个基于对话框的项目Stenotypist；

2）将SpeechRecognition.H，SpeechRecognition.CPP增加到Stenotypist项目中；

3）在资源编辑器中编辑好响应的控件；

4）用ClassWizard为控件在CStenotypistDlg 类中生成相应的成员；

5）修改StenotypistDlg.h文件，为类CStenotypistDlg增加相应的变量和函数；

6）用ClassWizard为CStenotypistDlg 类添加对控件和消息的响应函数。StenotypistDlg.h的代码如下。

#include "SpeechRecognition.h"

// CStenotypistDlg dialog

class CStenotypistDlg : public CDialog

{

// Construction

public:

CStenotypistDlg(CWnd* pParent = NULL); // standard constructor

// Dialog Data

//{{AFX_DATA(CStenotypistDlg)

enum { IDD = IDD_STENOTYPIST_DIALOG };

CButton m_btDictation;

CString m_strText;

//}}AFX_DATA

// ClassWizard generated virtual function overrides

//{{AFX_VIRTUAL(CStenotypistDlg)

protected:

virtual void DoDataExchange(CDataExchange* pDX); // DDX/DDV support

//}}AFX_VIRTUAL

CSpeechRecognition m_SpeechRecognition;

// Implementation

protected:

HICON m_hIcon;

// Generated message map functions

//{{AFX_MSG(CStenotypistDlg)

virtual BOOL OnInitDialog();

afx_msg void OnSysCommand(UINT nID, LPARAM lParam);

afx_msg void OnPaint();

afx_msg HCURSOR OnQueryDragIcon();

afx_msg void OnButtonVt();

afx_msg void OnButtonMs();

afx_msg void OnButtonDictate();

//}}AFX_MSG

afx_msg LRESULT OnSREvent(WPARAM, LPARAM);

DECLARE_MESSAGE_MAP()

};

注意，在CStenotypistDlg类中定义了一个CSpeechRecognition类的对象。

在OnInitDialog函数中调用CSpeechRecognition的初始化函数和设置语音语言列表：

BOOL CStenotypistDlg::OnInitDialog()

{

CDialog::OnInitDialog();

// Add "About..." menu item to system menu.

// IDM_ABOUTBOX must be in the system command range.

ASSERT((IDM_ABOUTBOX & 0xFFF0) == IDM_ABOUTBOX);

ASSERT(IDM_ABOUTBOX < 0xF000);

CMenu* pSysMenu = GetSystemMenu(FALSE);

if (pSysMenu != NULL)

{

CString strAboutMenu;

strAboutMenu.LoadString(IDS_ABOUTBOX);

if (!strAboutMenu.IsEmpty())

{

pSysMenu->AppendMenu(MF_SEPARATOR);

pSysMenu->AppendMenu(MF_STRING, IDM_ABOUTBOX, strAboutMenu);

}

// Set the icon for this dialog. The framework does this automatically

// when the application's main window is not a dialog

SetIcon(m_hIcon, TRUE); // Set big icon

SetIcon(m_hIcon, FALSE); // Set small icon

// TODO: Add extra initialization here

if (! m_SpeechRecognition.Initialize(m_hWnd))

AfxMessageBox(m_SpeechRecognition.GetErrorString());

m_SpeechRecognition.InitTokenList(GetDlgItem(IDC_LIST1)->m_hWnd);

m_SpeechRecognition.Stop();

return TRUE; // return TRUE unless you set the focus to a control

}

开始听写和停止听写的实现较简单，只需调用CSpeechRecognition类的响应函数就能实现，其代码如下所示。注意，停止和开始是互相切换的。

void CStenotypistDlg::OnButtonDictate()

{

if (m_SpeechRecognition.IsDictationOn())

{

m_SpeechRecognition.Stop();

m_btDictation.SetWindowText("听写(&D)");

SetWindowText("听写者 - 请按<听写>按钮开始听写!");

}

else

{

m_SpeechRecognition.Start();

m_btDictation.SetWindowText("停止(&S)");

SetWindowText("听写者 - 正在记录，请口述...");

}

设置话筒和语音训练也通过直接调用CSpeechRecognition类的成员函数来实现：

void CStenotypistDlg::OnButtonVt()

{ m_SpeechRecognition.VoiceTraining(m_hWnd);

}

void CStenotypistDlg::OnButtonMs()

{ m_SpeechRecognition.MicrophoneSetup(m_hWnd);

}

为了响应消息WM_SREVENT，需要添加相应的消息响应函数：

BEGIN_MESSAGE_MAP(CStenotypistDlg, CDialog)

//{{AFX_MSG_MAP(CStenotypistDlg)

ON_WM_SYSCOMMAND()

ON_WM_PAINT()

ON_WM_QUERYDRAGICON()

ON_BN_CLICKED(IDC_BUTTON_VT, OnButtonVt)

ON_BN_CLICKED(IDC_BUTTON_MS, OnButtonMs)

ON_BN_CLICKED(IDC_BUTTON_DICTATE, OnButtonDictate)

//}}AFX_MSG_MAP

ON_MESSAGE(WM_SREVENT, OnSREvent)

END_MESSAGE_MAP()

LRESULT CStenotypistDlg::OnSREvent(WPARAM, LPARAM)

{ WCHAR *pwzText;

m_SpeechRecognition.GetText(&pwzText);

m_strText += CString(pwzText);

UpdateData(FALSE);

return 0L;

}

7）为了调用Speech引擎，应该在Microsoft Visual C++编程环境中设置好相应的include和lib设置：

① 设置include路径

● 通过Project→Settings菜单项打开Project Settings对话框；

● 点击C/C++项；

● 在Category下拉列表中选取Preprocessor；

● 在“Additional include directories”编辑框中输入安装Speech SDK的include的路径，默认的路径是C:/Program Files/Microsoft Speech SDK 5.1/Include。

② 设置lib信息

● 通过Project→Settings菜单项打开Project Settings对话框；

● 选择Link项；

● 在Category下拉列表中选取Input项；

● 在“Additional library path”编辑框中输入安装Speech SDK的lib的路径，默认的路径是C:/Program Files/Microsoft Speech SDK 5.1/ Lib/i386；

● 将“sapi.lib”输入“Object/library modules”所标识的编辑框中。

8）编译连接该项目，就可让听写者开始听写了。

Stenotypist项目的所有源代码都存放在附盘的/Source/Stenotypist目录下。

最后

以上就是贪玩火龙果最近收集整理的关于语音识别编程技术的全部内容，更多相关语音识别编程技术内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：音频开发
浏览次数：236 次浏览
发布日期：2023-09-22 15:05:51
本文链接：https://www.kaopuke.com/article/k-p-k_14_uzo_2_f5_13_zcy.html

语音识别编程技术

11.3.1 构造CSpeechRecognition类

11.3.2 示例：用CSpeechRecognition类编制听写程序

最后

评论列表共有 0 条评论

发表评论取消回复

语音识别编程技术

11.3.1 构造CSpeechRecognition类

11.3.2 示例：用CSpeechRecognition类编制听写程序

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复