第一个关于语音信号处理的research笔记

100 阅读 0 评论 66 点赞

我是靠谱客的博主懦弱唇彩，这篇文章主要介绍第一个关于语音信号处理的research笔记，现在分享给大家，希望可以做个参考。

由于自己第一次接触这方面的内容，以前是计算机软件方面，对于信号处理方面是一窍不通，进入这个实验室，接触新的知识，新的血液，其实说实话挺难的，至少对于我这个笨笨的人来说是有难度的，打基础打了好久，基本上什么都要从头开始，首先学的就是奥本海默的《信号与系统》，宋知用老师的《MATLAB在语音信号分析与合成的应用》，《数值方法》，《信号处理教程》，《概率论与数理统计》，《算法导论》，周志华的《机器学习》，李航的《统计学习》等等，慢慢的对信号处理方面有了冰川一角的理解。

今天对第一个项目做一个小小的总结：

我们收集到的语音信号，一般都是包含很多噪音的，所以我们经常要进行语音信号的滤波和降噪处理，在同时还要截取信号。

接下来是我对信号的截取操作，但是效果不是很好。

根据peak来截取信号。

复制代码

function extract_middle_click()
readFilePath='D:datatoothSu*.wav';
readPathStr='D:datatoothSu';
savePathStr='D:datatoothddSu';
fileList=dir(readFilePath);
fileNum=length(fileList);
for j=1:fileNum
    name=fileList(j).name;      %获得cell数据中的name列 也就是完整的文件名字  Zhao-zhang Syam LWF  Su
    splitName=strsplit(name,'.');  %在.处截取.前面的字符串
    varStr = splitName{1};
    %dirname = [savePathStr,varStr,''];
    a = ['mkdir ' savePathStr];  %mkdir是一个判断文件夹的函数。没有创建，有的话就是一个警告不是错误
    system(a); %执行外部命令
    fileName=strcat(readPathStr,name);%这个语句 就是获得了这个文件的完整路径
    data = audioread(fileName);
    
%     [b,a]=butter(3,[5000/44100*2,15000/44100*2],'bandpass');     %
%     18800hz~19200hz 19Khz 44.1Khz (f/fs)*2   滤波
%      inputsignal = filter(b,a,data);
%      
    [event_index] = identify_middle_click_index(data)
    
    disp(['Alice is ' num2str(event_index) ' years old!']);
   for i=1:1:length(event_index)
    dataIndex = (event_index(i)-2000):(event_index(i)+2000);
%     datarange= inputsignal(dataIndex);
      datarange= data(dataIndex);
      
	%datarange = datarange/max(abs(datarange));
% 	[b,a]=butter(6,[0.8526,0.8707],'bandpass');     % 18800hz~19200hz 19Khz 44.1Khz (f/fs)*2
% 	filterData=filter(b,a,datarange);
% 	Fir = fir1(5000,[18985/44100*2,19015/44100*2],'stop');
%     outdata = filter(Fir,1,filterData);
    %varStr=inputname(1);
    newStr=[savePathStr,int2str(j),'.txt'];
	%newStr=[pathStr,varStr,'.txt'];
    dlmwrite(newStr,datarange);
    figure
    plot(datarange);
  end
end

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
function extract_middle_click()
readFilePath='D:datatoothSu*.wav';
readPathStr='D:datatoothSu';
savePathStr='D:datatoothddSu';
fileList=dir(readFilePath);
fileNum=length(fileList);
for j=1:fileNum
    name=fileList(j).name;      %获得cell数据中的name列 也就是完整的文件名字  Zhao-zhang Syam LWF  Su
    splitName=strsplit(name,'.');  %在.处截取.前面的字符串
    varStr = splitName{1};
    %dirname = [savePathStr,varStr,''];
    a = ['mkdir ' savePathStr];  %mkdir是一个判断文件夹的函数。没有创建，有的话就是一个警告不是错误
    system(a); %执行外部命令
    fileName=strcat(readPathStr,name);%这个语句 就是获得了这个文件的完整路径
    data = audioread(fileName);
    
%     [b,a]=butter(3,[5000/44100*2,15000/44100*2],'bandpass');     %
%     18800hz~19200hz 19Khz 44.1Khz (f/fs)*2   滤波
%      inputsignal = filter(b,a,data);
%      
    [event_index] = identify_middle_click_index(data)
    
    disp(['Alice is ' num2str(event_index) ' years old!']);
   for i=1:1:length(event_index)
    dataIndex = (event_index(i)-2000):(event_index(i)+2000);
%     datarange= inputsignal(dataIndex);
      datarange= data(dataIndex);
      
	%datarange = datarange/max(abs(datarange));
% 	[b,a]=butter(6,[0.8526,0.8707],'bandpass');     % 18800hz~19200hz 19Khz 44.1Khz (f/fs)*2
% 	filterData=filter(b,a,datarange);
% 	Fir = fir1(5000,[18985/44100*2,19015/44100*2],'stop');
%     outdata = filter(Fir,1,filterData);
    %varStr=inputname(1);
    newStr=[savePathStr,int2str(j),'.txt'];
	%newStr=[pathStr,varStr,'.txt'];
    dlmwrite(newStr,datarange);
    figure
    plot(datarange);
  end
end

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
function [event_index] = identify_middle_click_index(inputsignal)   % 这个函数最终反回的是peak的最终index
nf = 0.04;    %看时域图 看你的峰值一般都是大于多少，这个相当于过滤的一个阈值
span =20;
peakdistance = 4000;%这是个 阈值 ，来判断index上  峰值之间的距离
peakdistance2=20000;
event_index = [];
[lined_data,peaks,locs] = findpeak(inputsignal,nf,span); %find peak

  %  disp(['weizhi  is ' num2str(length(locs))]);   
%locs是peak的位置index 
%peaks是peak的值
j=2;
event_index(1)=locs(1);
for i=2:length(locs)
    if (locs(i)-locs(i-1))>peakdistance &&((locs(i)-locs(i-1)))<peakdistance2
    event_index(j)=locs(i);
    j=j+1;
    end
end

找到每个语音信号的peak

复制代码

function [lined_data,peaks,locs] = findpeak(x,nf,span)

%Function used to get the peaks (local maxima) from the given data 
% [lined_data,peaks,locs] = findpeak(x,nf)
% lined_data => peaks in the locations 
% peaks => Just the peak values
% locs => location at which peaks are occuring
% x => data for which peaks have to be obtained
% nf => Noise Floor
% span => span of the moving average required

for j=1:length(x(:,1))
        if(x(j)>=(nf))
            x(j)=x(j);
        end 
        if(x(j)<(nf))                                                      %Taking the values above the noise floor
            x(j)=nf;                                                       %Assigning the minimum value as noise floor magnitude
        end
end

x_smoothed=smooth(x-min(x),span,'moving');                                   %smoothing the shifted current snapshot
%20 is decided based on the type of data that is taken. It is like a cutoff
%frequency for a LPF.This moving average actually helps the findpeaks()
%function defined in Matlab library to decide the peak more efficiently
%especially in the case of experimental results when there is randomness
%in the data obtained.

[peaks,locs]=findpeaks(x_smoothed);                                        %get the peaks from the data

lined_data=zeros(1,length(x));                                             %lined data will have peaks at locations
lined_data(locs)=peaks;                                                    
lined_data=lined_data+min(x);                                              %shifting it back to original values

peaks=peaks+min(x);                                                        %Shifting it back to its original values

end

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
function [lined_data,peaks,locs] = findpeak(x,nf,span)

%Function used to get the peaks (local maxima) from the given data 
% [lined_data,peaks,locs] = findpeak(x,nf)
% lined_data => peaks in the locations 
% peaks => Just the peak values
% locs => location at which peaks are occuring
% x => data for which peaks have to be obtained
% nf => Noise Floor
% span => span of the moving average required


for j=1:length(x(:,1))
        if(x(j)>=(nf))
            x(j)=x(j);
        end 
        if(x(j)<(nf))                                                      %Taking the values above the noise floor
            x(j)=nf;                                                       %Assigning the minimum value as noise floor magnitude
        end
end



x_smoothed=smooth(x-min(x),span,'moving');                                   %smoothing the shifted current snapshot
%20 is decided based on the type of data that is taken. It is like a cutoff
%frequency for a LPF.This moving average actually helps the findpeaks()
%function defined in Matlab library to decide the peak more efficiently
%especially in the case of experimental results when there is randomness
%in the data obtained.

[peaks,locs]=findpeaks(x_smoothed);                                        %get the peaks from the data

lined_data=zeros(1,length(x));                                             %lined data will have peaks at locations
lined_data(locs)=peaks;                                                    
lined_data=lined_data+min(x);                                              %shifting it back to original values

peaks=peaks+min(x);                                                        %Shifting it back to its original values

end

下面是提取feature的部分：

1、计算每个人的MFCC feature。

2、查看每个人的MFCC的图像。

3、对每个人的MFCC的特征进行自相关的分析

A=corr(MFCC)；

A=corr(MFCC')；

查看图形进行分析；

4、由于每个人的MFCC特征，没在一个mat文件中（主要是我做批处理的时候，没有把代码写好）

所以把每个MFCC特征放在一起

①先双击一个人的mat文件，名称为MFCCS，也就是load进来

定义mfcc=MFCCs；

②再打开另外一个的mfcc的mat文件，文件名称也为MFCCs

mfcc=[mfcc,MFCCs]

....

最终把所有的单个的mat文件合并到一个mat文件中

最后再保存使用save 4.mat mfcc；

5、可以查看所有的mfcc的相关性，做一个简单的mfcc的分析

A=corr(MFCC')

6、点开所有的feature，这里也就是所有的mat文件，即4.mat。然后进行打label，1,2,3...

7、放入到SVM中进行模型的训练。

提取MFCC的feature代码

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function MFCCs = extract_mfcc()
filePath='D:datatoothddZhao-zhang*.txt';
pathStr='D:datatoothddZhao-zhang';
fileList=dir(filePath);
fileNum=length(fileList);
MFCCs = [];
 hamming = @(N)(0.54-0.46*cos(2*pi*[0:N-1].'/(N-1)));
for i=1:fileNum
 name=fileList(i).name;
    fileName=strcat(pathStr,name);
    data=dlmread(fileName);
   [ CC, FBE, frames ] = mfcc(data,44100,25,10,0.97,hamming,[5000,15000],20,13,22);
   MFCCs = [MFCCs,mean(CC')'];
   
end
MFCCs = MFCCs';
save('MFCCs.mat');

SVM 代码如下：

复制代码

function ac = ovoSVM()
%mfcc=load ('mfcc.mat');         %data format: n*m matrix, n is the number of observations,m-1 is number the dimension of the features, 
 load mfcc.mat            % the last colum is the labels corresponding to the observations
%[meas,species] = formatdata_svm();
labels = mfcc(:,14);
[~,~,labels] = unique(labels);   % # labels: 1/2/3
observations = mfcc(:,1:13);

data = zscore(observations); % # scale featuresx
%data = meas;
numInst = size(data,1);  %获取矩阵的行数
%numLabels = max(labels);

% # split training/testing
idx = randperm(numInst);  %获取行数的随机排列 1-16的随机排列
numTrain = 8; 
%numTest = numInst - numTrain;
trainData = data(idx(1:numTrain),:);  testData = data(idx(numTrain+1:end),:);
trainLabel = labels(idx(1:numTrain)); testLabel = labels(idx(numTrain+1:end));

% model=svmtrain(trainLabel,trainData,'-c 24 -g 4.1');
% [prediction_decision_label,prediction_accuracy,dec_value]=svmpredict(testLabel,testData,model);
% [training_decision_label,training_accuracy,dec_value]=svmpredict(trainLabel,trainData,model);
bestcv = 0;
for log2c = -4:12,
  for log2g = -8:4,
%       for log2c = -1:3,
%   for log2g = -4:1,
    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
    cv = svmtrain(trainLabel, trainData, cmd);
    if (cv >= bestcv),
      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
    end
 %   fprintf('%g %g %g (best c=%g, g=%g, rate=%g)n', log2c, log2g, cv, bestc, bestg, bestcv);
  end
end
% # train one-against-one model
  cmd2 = ['-c ', num2str(bestc), ' -g ',num2str(bestg), ' -b 1 '];
    model = svmtrain(double(trainLabel), trainData, cmd2);
% # get probability estimates of test instances using each model
    [pred,acc,preb] = svmpredict(double(testLabel), testData, model, '-b 1');
	disp(pred);
	ac = acc(1);
	disp(['the accuracy is:' int2str(ac)]);
    
 CM=confusionmat(testLabel,pred);
   imagesc(CM);
   colormap(flipud(gray));  
   axis xy;
   xlabel('Groundtruth');% x轴名称
   ylabel('Prediction');

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
function ac = ovoSVM()
%mfcc=load ('mfcc.mat');         %data format: n*m matrix, n is the number of observations,m-1 is number the dimension of the features, 
 load mfcc.mat            % the last colum is the labels corresponding to the observations
%[meas,species] = formatdata_svm();
labels = mfcc(:,14);
[~,~,labels] = unique(labels);   % # labels: 1/2/3
observations = mfcc(:,1:13);

data = zscore(observations); % # scale featuresx
%data = meas;
numInst = size(data,1);  %获取矩阵的行数
%numLabels = max(labels);

% # split training/testing
idx = randperm(numInst);  %获取行数的随机排列 1-16的随机排列
numTrain = 8; 
%numTest = numInst - numTrain;
trainData = data(idx(1:numTrain),:);  testData = data(idx(numTrain+1:end),:);
trainLabel = labels(idx(1:numTrain)); testLabel = labels(idx(numTrain+1:end));

% model=svmtrain(trainLabel,trainData,'-c 24 -g 4.1');
% [prediction_decision_label,prediction_accuracy,dec_value]=svmpredict(testLabel,testData,model);
% [training_decision_label,training_accuracy,dec_value]=svmpredict(trainLabel,trainData,model);
bestcv = 0;
for log2c = -4:12,
  for log2g = -8:4,
%       for log2c = -1:3,
%   for log2g = -4:1,
    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
    cv = svmtrain(trainLabel, trainData, cmd);
    if (cv >= bestcv),
      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
    end
 %   fprintf('%g %g %g (best c=%g, g=%g, rate=%g)n', log2c, log2g, cv, bestc, bestg, bestcv);
  end
end
% # train one-against-one model
  cmd2 = ['-c ', num2str(bestc), ' -g ',num2str(bestg), ' -b 1 '];
    model = svmtrain(double(trainLabel), trainData, cmd2);
% # get probability estimates of test instances using each model
    [pred,acc,preb] = svmpredict(double(testLabel), testData, model, '-b 1');
	disp(pred);
	ac = acc(1);
	disp(['the accuracy is:' int2str(ac)]);
    
 CM=confusionmat(testLabel,pred);
   imagesc(CM);
   colormap(flipud(gray));  
   axis xy;
   xlabel('Groundtruth');% x轴名称
   ylabel('Prediction');