我是靠谱客的博主 懦弱唇彩,这篇文章主要介绍第一个关于语音信号处理的research笔记,现在分享给大家,希望可以做个参考。

由于自己第一次接触这方面的内容,以前是计算机软件方面,对于信号处理方面是一窍不通,进入这个实验室,接触新的知识,新的血液,其实 说实话挺难的,至少对于我这个笨笨的人来说是有难度的,打基础打了好久,基本上什么都要从头开始,首先学的就是奥本海默的《信号与系统》,宋知用老师的《MATLAB在语音信号分析与合成的应用》,《数值方法》,《信号处理教程》,《概率论与数理统计》,《算法导论》,周志华的《机器学习》,李航的《统计学习》等等,慢慢的对信号处理方面有了冰川一角的理解。


今天对第一个项目做一个小小的 总结:

我们收集到的语音信号,一般都是包含很多噪音的,所以我们经常要进行语音信号的滤波和降噪处理,在同时还要截取信号。

接下来是我对信号的截取操作,但是效果不是很好。

根据peak来截取信号。

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
function extract_middle_click() readFilePath='D:datatoothSu*.wav'; readPathStr='D:datatoothSu'; savePathStr='D:datatoothddSu'; fileList=dir(readFilePath); fileNum=length(fileList); for j=1:fileNum name=fileList(j).name; %获得cell数据中的name列 也就是完整的文件名字 Zhao-zhang Syam LWF Su splitName=strsplit(name,'.'); %在.处截取.前面的字符串 varStr = splitName{1}; %dirname = [savePathStr,varStr,'']; a = ['mkdir ' savePathStr]; %mkdir是一个判断文件夹的函数。没有创建,有的话就是一个警告不是错误 system(a); %执行外部命令 fileName=strcat(readPathStr,name);%这个语句 就是获得了这个文件的完整路径 data = audioread(fileName); % [b,a]=butter(3,[5000/44100*2,15000/44100*2],'bandpass'); % % 18800hz~19200hz 19Khz 44.1Khz (f/fs)*2 滤波 % inputsignal = filter(b,a,data); % [event_index] = identify_middle_click_index(data) disp(['Alice is ' num2str(event_index) ' years old!']); for i=1:1:length(event_index) dataIndex = (event_index(i)-2000):(event_index(i)+2000); % datarange= inputsignal(dataIndex); datarange= data(dataIndex); %datarange = datarange/max(abs(datarange)); % [b,a]=butter(6,[0.8526,0.8707],'bandpass'); % 18800hz~19200hz 19Khz 44.1Khz (f/fs)*2 % filterData=filter(b,a,datarange); % Fir = fir1(5000,[18985/44100*2,19015/44100*2],'stop'); % outdata = filter(Fir,1,filterData); %varStr=inputname(1); newStr=[savePathStr,int2str(j),'.txt']; %newStr=[pathStr,varStr,'.txt']; dlmwrite(newStr,datarange); figure plot(datarange); end end


复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
function [event_index] = identify_middle_click_index(inputsignal) % 这个函数最终反回的是peak的最终index nf = 0.04; %看时域图 看你的峰值一般都是大于多少,这个相当于过滤的一个阈值 span =20; peakdistance = 4000;%这是个 阈值 ,来判断index上 峰值之间的距离 peakdistance2=20000; event_index = []; [lined_data,peaks,locs] = findpeak(inputsignal,nf,span); %find peak % disp(['weizhi is ' num2str(length(locs))]); %locs是peak的位置index %peaks是peak的值 j=2; event_index(1)=locs(1); for i=2:length(locs) if (locs(i)-locs(i-1))>peakdistance &&((locs(i)-locs(i-1)))<peakdistance2 event_index(j)=locs(i); j=j+1; end end
找到每个语音信号的peak
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
function [lined_data,peaks,locs] = findpeak(x,nf,span) %Function used to get the peaks (local maxima) from the given data % [lined_data,peaks,locs] = findpeak(x,nf) % lined_data => peaks in the locations % peaks => Just the peak values % locs => location at which peaks are occuring % x => data for which peaks have to be obtained % nf => Noise Floor % span => span of the moving average required for j=1:length(x(:,1)) if(x(j)>=(nf)) x(j)=x(j); end if(x(j)<(nf)) %Taking the values above the noise floor x(j)=nf; %Assigning the minimum value as noise floor magnitude end end x_smoothed=smooth(x-min(x),span,'moving'); %smoothing the shifted current snapshot %20 is decided based on the type of data that is taken. It is like a cutoff %frequency for a LPF.This moving average actually helps the findpeaks() %function defined in Matlab library to decide the peak more efficiently %especially in the case of experimental results when there is randomness %in the data obtained. [peaks,locs]=findpeaks(x_smoothed); %get the peaks from the data lined_data=zeros(1,length(x)); %lined data will have peaks at locations lined_data(locs)=peaks; lined_data=lined_data+min(x); %shifting it back to original values peaks=peaks+min(x); %Shifting it back to its original values end



下面是提取feature的部分:

1、计算每个人的MFCC feature。

2、查看每个人的MFCC的图像。

3、对每个人的MFCC的特征进行自相关的分析

        A=corr(MFCC);

        A=corr(MFCC');

       查看图形进行分析;

4、由于每个人的MFCC特征,没在一个mat文件中(主要是我做批处理的时候,没有把代码写好)

      所以把每个MFCC特征放在一起

      ①先双击一个人的mat文件,名称为MFCCS,也就是load进来

         定义mfcc=MFCCs;

     ②再打开另外一个的mfcc的mat文件,文件名称也为MFCCs

              mfcc=[mfcc,MFCCs]

          ....

         最终把所有的单个的mat文件合并到一个mat文件中

         最后再保存 使用save  4.mat mfcc;

5、可以查看所有的mfcc的相关性,做一个简单的mfcc的分析

       A=corr(MFCC')

6、点开所有的feature,这里也就是所有的mat文件,即4.mat。然后进行打label,1,2,3...


7、放入到SVM中进行模型的训练。


提取MFCC的feature代码

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function MFCCs = extract_mfcc() filePath='D:datatoothddZhao-zhang*.txt'; pathStr='D:datatoothddZhao-zhang'; fileList=dir(filePath); fileNum=length(fileList); MFCCs = []; hamming = @(N)(0.54-0.46*cos(2*pi*[0:N-1].'/(N-1))); for i=1:fileNum name=fileList(i).name; fileName=strcat(pathStr,name); data=dlmread(fileName); [ CC, FBE, frames ] = mfcc(data,44100,25,10,0.97,hamming,[5000,15000],20,13,22); MFCCs = [MFCCs,mean(CC')']; end MFCCs = MFCCs'; save('MFCCs.mat');



SVM 代码如下:

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
function ac = ovoSVM() %mfcc=load ('mfcc.mat'); %data format: n*m matrix, n is the number of observations,m-1 is number the dimension of the features, load mfcc.mat % the last colum is the labels corresponding to the observations %[meas,species] = formatdata_svm(); labels = mfcc(:,14); [~,~,labels] = unique(labels); % # labels: 1/2/3 observations = mfcc(:,1:13); data = zscore(observations); % # scale featuresx %data = meas; numInst = size(data,1); %获取矩阵的行数 %numLabels = max(labels); % # split training/testing idx = randperm(numInst); %获取行数的随机排列 1-16的随机排列 numTrain = 8; %numTest = numInst - numTrain; trainData = data(idx(1:numTrain),:); testData = data(idx(numTrain+1:end),:); trainLabel = labels(idx(1:numTrain)); testLabel = labels(idx(numTrain+1:end)); % model=svmtrain(trainLabel,trainData,'-c 24 -g 4.1'); % [prediction_decision_label,prediction_accuracy,dec_value]=svmpredict(testLabel,testData,model); % [training_decision_label,training_accuracy,dec_value]=svmpredict(trainLabel,trainData,model); bestcv = 0; for log2c = -4:12, for log2g = -8:4, % for log2c = -1:3, % for log2g = -4:1, cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)]; cv = svmtrain(trainLabel, trainData, cmd); if (cv >= bestcv), bestcv = cv; bestc = 2^log2c; bestg = 2^log2g; end % fprintf('%g %g %g (best c=%g, g=%g, rate=%g)n', log2c, log2g, cv, bestc, bestg, bestcv); end end % # train one-against-one model cmd2 = ['-c ', num2str(bestc), ' -g ',num2str(bestg), ' -b 1 ']; model = svmtrain(double(trainLabel), trainData, cmd2); % # get probability estimates of test instances using each model [pred,acc,preb] = svmpredict(double(testLabel), testData, model, '-b 1'); disp(pred); ac = acc(1); disp(['the accuracy is:' int2str(ac)]); CM=confusionmat(testLabel,pred); imagesc(CM); colormap(flipud(gray)); axis xy; xlabel('Groundtruth');% x轴名称 ylabel('Prediction');



最后

以上就是懦弱唇彩最近收集整理的关于第一个关于语音信号处理的research笔记的全部内容,更多相关第一个关于语音信号处理内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(66)

评论列表共有 0 条评论

立即
投稿
返回
顶部