Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

For all those who are working on MFCC, Please help me out with this question...

Status
Not open for further replies.

Shweta_S

Member level 3
Joined
Jun 20, 2006
Messages
59
Helped
1
Reputation
2
Reaction score
1
Trophy points
1,288
Location
Chennai
Activity points
1,655
I am working on speech recognition... Using MFCC as coefficients as speech features...

I generated a filter bank of overlapping triangular filters in frequency domain.

Since it is in frequency domain, I multiplied the power spectrum of the signal with each filter.

Now what do I do? Should I take the sum of the bandpassed signal in each band to give the filter bank outputs.

Then take the log and DCT of this signal right?

Is taking the sum of the bandpassed signal (in the frequency domain) correct?

Please help me out...
 

hi all am also working on MFCC i have searched google but there are some points that i couldn't understand in the attached link...if any one pls can help us

in the following ink especially fig.1:

**broken link removed**


1)upon what bases the frame length is determined and the shift is determined,do we use a frame for each word utterance but how there will not be an overlap ???

2) what kind of window [hamming or hyprid hamming] we will use and upon which bases we determine it's parameters and what is it's parameters ???

3)what is the spectral subtraction (minimum statistics) ??

4)In the filter bank....I have understood that for each frame we divide the spectrum into sub bands using an array of band pass filters but my question how many subbands we will divide the band of each frame into ??? and why?? will we use FIR ,IIR, ???what order of filters??? what is the return value of fbank[m,i] .

5)I don't what we will do in the rasta processing ??

6) how do we do DCT ??

7)in the last stage how can we combin Ef[m] and C[m,i] to produce the final feature vector ???

----------------------------

8)after obtaining the feature vectore for each fram using MFCC how to combin them to produce a multidimentional pdf to use a weighted sum of gaussian to make an accoustic model necessery for the classification algorithm......


thank you all
 

hi i am also working on speech recognition project
this is a complete matlab code for MFCC and i have pdf book explain all steps for feature extraction using MFCC
function [ceps,freqresp,fb,fbrecon,freqrecon] = ...
mfcc2(input, samplingRate, frameRate,x)
global mfccDCTMatrix mfccFilterWeights



lowestFrequency = 133.3333;
linearFilters = 13;
linearSpacing = 66.66666666;
logFilters = 27;
logSpacing = 1.0711703;
fftSize = 512;
cepstralCoefficients = x;
windowSize = 400;
windowSize = 256;
if (nargin < 2) samplingRate = 16000; end;
if (nargin < 3) frameRate = 100; end;

totalFilters = linearFilters + logFilters;

freqs = lowestFrequency + (0:linearFilters-1)*linearSpacing;
freqs(linearFilters+1:totalFilters+2) = ...
freqs(linearFilters) * logSpacing.^(1:logFilters+2);

lower = freqs(1:totalFilters);
center = freqs(2:totalFilters+1);
upper = freqs(3:totalFilters+2);

mfccFilterWeights = zeros(totalFilters,fftSize);
triangleHeight = 2./(upper-lower);
fftFreqs = (0:fftSize-1)/fftSize*samplingRate;

for chan=1:totalFilters
mfccFilterWeights(chan,:) = ...
(fftFreqs > lower(chan) & fftFreqs <= center(chan)).* ...
triangleHeight(chan).*(fftFreqs-lower(chan))/(center(chan)-lower(chan)) + ...
(fftFreqs > center(chan) & fftFreqs < upper(chan)).* ...
triangleHeight(chan).*(upper(chan)-fftFreqs)/(upper(chan)-center(chan));
end

hamWindow = 0.54 - 0.46*cos(2*pi*(0:windowSize-1)/windowSize);

if 0 % Window it like ComplexSpectrum
windowStep = samplingRate/frameRate;
a = .54;
b = -.46;
wr = sqrt(windowStep/windowSize);
phi = pi/windowSize;
hamWindow = 2*wr/sqrt(4*a*a+2*b*b)* ...
(a + b*cos(2*pi*(0:windowSize-1)/windowSize + phi));
end

mfccDCTMatrix = 1/sqrt(totalFilters/2)*cos((0:(cepstralCoefficients-1))' * ...
(2*(0:(totalFilters-1))+1) * pi/2/totalFilters);
mfccDCTMatrix(1,:) = mfccDCTMatrix(1,:) * sqrt(2)/2;

if 1
preEmphasized = filter([1 -.97], 1, input);
else
preEmphasized = input;
end
windowStep = samplingRate/frameRate;
cols = fix((length(input)-windowSize)/windowStep);

ceps = zeros(cepstralCoefficients, cols);
if (nargout > 1) freqresp = zeros(fftSize/2, cols); end;
if (nargout > 2) fb = zeros(totalFilters, cols); end;

if (nargout > 4)
fr = (0:(fftSize/2-1))'/(fftSize/2)*samplingRate/2;
j = 1;
for i=1:(fftSize/2)
if fr(i) > center(j+1)
j = j + 1;
end
if j > totalFilters-1
j = totalFilters-1;
end
fr(i) = min(totalFilters-.0001, ...
max(1,j + (fr(i)-center(j))/(center(j+1)-center(j))));
end
fri = fix(fr);
frac = fr - fri;

freqrecon = zeros(fftSize/2, cols);
end

for start=0:cols-1
first = start*windowStep + 1;
last = first + windowSize-1;
fftData = zeros(1,fftSize);
fftData(1:windowSize) = preEmphasized(first:last).*hamWindow;
fftMag = abs(fft(fftData));
earMag = log10(mfccFilterWeights * fftMag');

ceps:),start+1) = mfccDCTMatrix * earMag;
if (nargout > 1) freqresp:),start+1) = fftMag(1:fftSize/2)'; end;
if (nargout > 2) fb:),start+1) = earMag; end
if (nargout > 3)
fbrecon:),start+1) = ...
mfccDCTMatrix(1:cepstralCoefficients,:)' * ...
ceps:),start+1);
end
if (nargout > 4)
f10 = 10.^fbrecon:),start+1);
freqrecon:),start+1) = samplingRate/fftSize * ...
(f10(fri).*(1-frac) + f10(fri+1).*frac);
end
end

if 1 & (nargout > 3)
fbrecon = mfccDCTMatrix(1:cepstralCoefficients,:)' * ceps;
end;
 

- delete -
 
Last edited:

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top