MATLAB code for speech recognition

Poornima L N · Jan 31, 2012

I have taken up project on speech recognition ... i need codes for the same using Mfcc and vector quantization. please mail it to me on poornimaln41@gmail.com

snikfreak · Jan 31, 2012

hey there,
why don't you just start with codes and show us where you get stuck, then we will absolutely help you

cheers

iVenky · Feb 5, 2012

Hello. You are in the right place. I have the code for speaker recognition using Mfcc. But I used neural networks for pattern recognition. I believe neural networks are better than Vector Quantization. If you need that code send private message to me.

kunal juneja · Apr 1, 2012

iVenky said:
Hello. You are in the right place. I have the code for speaker recognition using Mfcc. But I used neural networks for pattern recognition. I believe neural networks are better than Vector Quantization. If you need that code send private message to me.

hello sir

i am making a voice recognition system in which it is speaker independent ....i want to make voice recognition system in which it is only of digits 1-9 as i say 1 it will recognize 1 and will write 1 on the computer system plzzz sir send me the code
my mail id is kunaljuneja5@gmail.com
i will be really grateful and really appreciate for your support

iVenky · Apr 1, 2012

kunal juneja said:
hello sir

i am making a voice recognition system in which it is speaker independent ....i want to make voice recognition system in which it is only of digits 1-9 as i say 1 it will recognize 1 and will write 1 on the computer system plzzz sir send me the code
my mail id is kunaljuneja5@gmail.com
i will be really grateful and really appreciate for your support

See my blog. It's there in my blog. Here's the link

https://www.edaboard.com/blog/1466/

And please do rate my blog and leave some comments.

Thanks

kunal juneja · Apr 2, 2012

thxx alot sir but can i use this coding

This program records the voice
function [norm_voice,h] = Voice_Rec(sample_freq)
option = 'n';
option_rec = 'n';
record_len = 1; %Record time length in seconds
%sample_freq = 8192; %Sampling frequency in Hertz
sample_time = sample_freq * record_len;

'Get ready to record your voice'
name = input('Enter the file name you want to save the file with: ','s');
file_name = sprintf('%s.wav',name);
option_rec = input('Press y to record: ','s');
if option_rec=='y'
while option=='n',
input('Press enter when ready to record--> ');
record = wavrecord(sample_time, sample_freq); %Records the input through the sound card to the variable with specified sampling frequency
input('Press enter to listen the recorded voice--> ');
sound(record, sample_freq);
option = input('Press y to save or n to record again: ','s');
end
wavwrite(record, sample_freq, file_name); %Save the recorded data to a file with the specified file name in .wav format
end
[voice_read,FS,NBITS]=wavread(file_name);
norm_voice = normalize(voice_read);
norm_voice = downsmpl(norm_voice, sample_freq);
le=32;
h=daubcqf(le,'min');

function vec = normalize(vec)

temp_vec = vec-mean(vec);
sum_temp_vec = sum(temp_vec.*temp_vec);
sqrt_temp_vec = sqrt(sum_temp_vec);
vec = (1/sqrt_temp_vec)*temp_vec;

function sampled = downsmpl(voice, freq)

x=freq;
y = freq/2;
z=1;
a=1;
sampled=0;
while z<freq,
sampled(a) = sqrt(abs(voice(z)*voice(z+1)));
a=a+1;
z = z+2;
end
sampled = sampled';

function [h_0,h_1] = daubcqf(N,TYPE)
% [h_0,h_1] = daubcqf(N,TYPE);
%
% Function computes the Daubechies' scaling and wavelet filters
% (normalized to sqrt(2)).
%
% Input:
% N : Length of filter (must be even)
% TYPE : Optional parameter that distinguishes the minimum phase,
% maximum phase and mid-phase solutions ('min', 'max', or
% 'mid'). If no argument is specified, the minimum phase
% solution is used.
%
% Output:
% h_0 : Minimal phase Daubechies' scaling filter
% h_1 : Minimal phase Daubechies' wavelet filter
%
% Example:
% N = 4;
% TYPE = 'min';
% [h_0,h_1] = daubcqf(N,TYPE)
% h_0 = 0.4830 0.8365 0.2241 -0.1294
% h_1 = 0.1294 0.2241 -0.8365 0.4830
%
if(nargin < 2),
TYPE = 'min';
end;
if(rem(N,2) ~= 0),
error('No Daubechies filter exists for ODD length');
end;
K = N/2;
a = 1;
p = 1;
q = 1;
h_0 = [1 1];
for j = 1:K-1,
a = -a * 0.25 * (j + K - 1)/j;
h_0 = [0 h_0] + [h_0 0];
p = [0 -p] + [p 0];
p = [0 -p] + [p 0];
q = [0 q 0] + a*p;
end;
q = sort(roots(q));
qt = q(1:K-1);
if TYPE=='mid',
if rem(K,2)==1,
qt = q([1:4:N-2 2:4:N-2]);
else
qt = q([1 4:4:K-1 5:4:K-1 N-3:-4:K N-4:-4:K]);
end;
end;
h_0 = conv(h_0,real(poly(qt)));
h_0 = sqrt(2)*h_0/sum(h_0); %Normalize to sqrt(2);
if(TYPE=='max'),
h_0 = fliplr(h_0);
end;
if(abs(sum(h_0 .^ 2))-1 > 1e-4)
error('Numerically unstable for this value of "N".');
end;
h_1 = rot90(h_0,2);
h_1(1:2:N)=-h_1(1:2:N);
plzz sir tell me as i am new in this
regards
kunal

ROCKET SCIENTIST · Apr 3, 2012

Thanx a lot for the code kunal. I was also looking for this.

saad47 · May 5, 2012

hi kunal
do i use this code in one editor or make separate editors

vitesh.acharekar · Aug 8, 2012

Hi Kunal,

I am also looking for speaker independent speech recognization of digits 1-9 using neural network.
It would be helpful if you can share code if it is working fine.

Thanks and regards
Vitesh A

rrdmp · Aug 8, 2012

iVenky said:
Hello. You are in the right place. I have the code for speaker recognition using Mfcc. But I used neural networks for pattern recognition. I believe neural networks are better than Vector Quantization. If you need that code send private message to me.

i want to apply the neural network in my speech proceesing project.Its urgent.
he plz send the same code on my main id.
praveen.553@gmail.com

tejastalele · Mar 24, 2013

I am developing a code on speech recognition using neural networks, had tried using normal signal filtering and then comparing the cepstral coefficients but is not accurate.

Please forward me the code for neural networks for speech recognition on my mail id, its very urgent.

talele.tejas@gmail.com

nishantsingh · Apr 5, 2013

iVenky said:
See my blog. It's there in my blog. Here's the link

https://www.edaboard.com/blog/1466/

And please do rate my blog and leave some comments.

Thanks

hi dear venky sir,
i m also working on a project of speaker diarisation n want a code for mfcc for speaker recognition, i wil b grateful if u can send me one.
thnx in advance
my email is
nishantsingh8@yahoo.com

iVenky · Apr 6, 2013

I have posted the code in my blog in edaboard.

achrafbenba · Apr 8, 2013

% mfcc - Mel frequency cepstrum coefficient analysis.
% [ceps,freqresp,fb,fbrecon,freqrecon] = ...
% mfcc(input, samplingRate, [frameRate])
% Find the cepstral coefficients (ceps) corresponding to the
% input. Four other quantities are optionally returned that
% represent:
% the detailed fft magnitude (freqresp) used in MFCC calculation,
% the mel-scale filter bank output (fb)
% the filter bank output by inverting the cepstrals with a cosine
% transform (fbrecon),
% the smooth frequency response by interpolating the fb reconstruction
% (freqrecon)
% -- Malcolm Slaney, August 1993
% Modified a bit to make testing an algorithm easier... 4/15/94
% Fixed Cosine Transform (indices of cos() were swapped) - 5/26/95
% Added optional frameRate argument - 6/8/95
% Added proper filterbank reconstruction using inverse DCT - 10/27/95
% Added filterbank inversion to reconstruct spectrum - 11/1/95

% (c) 1998 Interval Research Corporation

function [ceps,freqresp,fb,fbrecon,freqrecon] = ...
MFCC(input, samplingRate, frameRate)
global mfccDCTMatrix mfccFilterWeights

[r c] = size(input);
if (r > c)
input=input';
end

% Filter bank parameters
lowestFrequency = 133.3333;
linearFilters = 13;
linearSpacing = 66.66666666;
logFilters = 27;
logSpacing = 1.0711703;
fftSize = 512;
cepstralCoefficients = 13;
windowSize = 400;
windowSize = 256; % Standard says 400, but 256 makes more sense
% Really should be a function of the sample
% rate (and the lowestFrequency) and the
% frame rate.
if (nargin < 2) samplingRate = 16000; end;
if (nargin < 3) frameRate = 100; end;

% Keep this around for later....
totalFilters = linearFilters + logFilters;

% Now figure the band edges. Interesting frequencies are spaced
% by linearSpacing for a while, then go logarithmic. First figure
% all the interesting frequencies. Lower, center, and upper band
% edges are all consequtive interesting frequencies.

freqs = lowestFrequency + (0:linearFilters-1)*linearSpacing;
freqs(linearFilters+1:totalFilters+2) = ...
freqs(linearFilters) * logSpacing.^(1:logFilters+2);

lower = freqs(1:totalFilters);
center = freqs(2:totalFilters+1);
upper = freqs(3:totalFilters+2);

% We now want to combine FFT bins so that each filter has unit
% weight, assuming a triangular weighting function. First figure
% out the height of the triangle, then we can figure out each
% frequencies contribution
mfccFilterWeights = zeros(totalFilters,fftSize);
triangleHeight = 2./(upper-lower);
fftFreqs = (0:fftSize-1)/fftSize*samplingRate;

for chan=1:totalFilters
mfccFilterWeights(chan,

= ...
(fftFreqs > lower(chan) & fftFreqs <= center(chan)).* ...
triangleHeight(chan).*(fftFreqs-lower(chan))/(center(chan)-lower(chan)) + ...
(fftFreqs > center(chan) & fftFreqs < upper(chan)).* ...
triangleHeight(chan).*(upper(chan)-fftFreqs)/(upper(chan)-center(chan));
end
%semilogx(fftFreqs,mfccFilterWeights')
%axis([lower(1) upper(totalFilters) 0 max(max(mfccFilterWeights))])

hamWindow = 0.54 - 0.46*cos(2*pi*(0:windowSize-1)/windowSize);

if 0 % Window it like ComplexSpectrum
windowStep = samplingRate/frameRate;
a = .54;
b = -.46;
wr = sqrt(windowStep/windowSize);
phi = pi/windowSize;
hamWindow = 2*wr/sqrt(4*a*a+2*b*b)* ...
(a + b*cos(2*pi*(0:windowSize-1)/windowSize + phi));
end

% Figure out Discrete Cosine Transform. We want a matrix
% dct(i,j) which is totalFilters x cepstralCoefficients in size.
% The i,j component is given by
% cos( i * (j+0.5)/totalFilters pi )
% where we have assumed that i and j start at 0.
mfccDCTMatrix = 1/sqrt(totalFilters/2)*cos((0

cepstralCoefficients-1))' * ...
(2*(0

totalFilters-1))+1) * pi/2/totalFilters);
mfccDCTMatrix(1,

= mfccDCTMatrix(1,

* sqrt(2)/2;

%imagesc(mfccDCTMatrix);

% Filter the input with the preemphasis filter. Also figure how
% many columns of data we will end up with.
if 1
preEmphasized = filter([1 -.97], 1, input);
else
preEmphasized = input;
end
windowStep = samplingRate/frameRate;
cols = fix((length(input)-windowSize)/windowStep);

% Allocate all the space we need for the output arrays.
ceps = zeros(cepstralCoefficients, cols);
if (nargout > 1) freqresp = zeros(fftSize/2, cols); end;
if (nargout > 2) fb = zeros(totalFilters, cols); end;

% Invert the filter bank center frequencies. For each FFT bin
% we want to know the exact position in the filter bank to find
% the original frequency response. The next block of code finds the
% integer and fractional sampling positions.
if (nargout > 4)
fr = (0

fftSize/2-1))'/(fftSize/2)*samplingRate/2;
j = 1;
for i=1

fftSize/2)
if fr(i) > center(j+1)
j = j + 1;
end
if j > totalFilters-1
j = totalFilters-1;
end
fr(i) = min(totalFilters-.0001, ...
max(1,j + (fr(i)-center(j))/(center(j+1)-center(j))));
end
fri = fix(fr);
frac = fr - fri;

freqrecon = zeros(fftSize/2, cols);
end

% Ok, now let's do the processing. For each chunk of data:
% * Window the data with a hamming window,
% * Shift it into FFT order,
% * Find the magnitude of the fft,
% * Convert the fft data into filter bank outputs,
% * Find the log base 10,
% * Find the cosine transform to reduce dimensionality.
for start=0:cols-1
first = floor(start*windowStep) + 1;
last = first + windowSize-1;
fftData = zeros(1,fftSize);
fftData(1:windowSize) = preEmphasized(first:last).*hamWindow;
fftMag = abs(fft(fftData));
earMag = log10(mfccFilterWeights * fftMag');

ceps

,start+1) = mfccDCTMatrix * earMag;
if (nargout > 1) freqresp

,start+1) = fftMag(1:fftSize/2)'; end;
if (nargout > 2) fb

,start+1) = earMag; end
if (nargout > 3)
fbrecon

,start+1) = ...
mfccDCTMatrix(1:cepstralCoefficients,

' * ...
ceps

,start+1);
end
if (nargout > 4)
f10 = 10.^fbrecon

,start+1);
freqrecon

,start+1) = samplingRate/fftSize * ...
(f10(fri).*(1-frac) + f10(fri+1).*frac);
end
end

% OK, just to check things, let's also reconstruct the original FB
% output. We do this by multiplying the cepstral data by the transpose
% of the original DCT matrix. This all works because we were careful to
% scale the DCT matrix so it was orthonormal.
if 1 && (nargout > 3)
fbrecon = mfccDCTMatrix(1:cepstralCoefficients,

' * ceps;
% imagesc(mt

,1:cepstralCoefficients)*mfccDCTMatrix);
end;

123jack · Apr 8, 2013

I'm curious
What data other than the voice recognition are you feeding into the neural net?

achrafbenba · Apr 8, 2013

detecting disease by MFCC

Welcome to EDAboard.com

MATLAB code for speech recognition

Newbie level 2

Newbie level 4

Advanced Member level 2

Newbie level 2

Advanced Member level 2

Newbie level 2

Member level 3

Newbie level 1

Newbie level 1

Newbie level 3

Newbie level 1

Newbie level 1

Advanced Member level 2

Newbie level 5

Advanced Member level 2

Newbie level 5

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor