payate
Newbie level 6

dear fren..
i am new in speech recognition..having problem in recognize the isolated numerical from utterance..
now, im only have code until FFT..after that i dunno how to make windowing and get input and target from the code..how to create function neural network..
please anyone who knows about this just guide me
really need ur help now...
this is my code..:
function matlab_tools()
% function matlab_tools
% =====================
%
% Some basic tools / functions from matlab which are important
% for speech processing.
% Graphical intialization
% -----------------------
figure(1);
set(gcf,'position',[150 150 500 500]) % Bigger window than default
% Read a *.wav signal
% -------------------
[x,fs] = wavread('two.wav');
% Play the signal
% ---------------
sound(x,fs);
% Preepmhasis filter
% ------------------
%
% It is common practice to use a preemphasis filter in speech recognition
% tasks. It is a simple hight pass filter. One effect is thus that it removes
% a bias from the signal.
% If you listen to the signal after the preemphasis filter, you will hear
% that it sounds differently.
% You can comment the preemphasis filter out to see if there is a difference
% in the oscillogram.
precoeff = -0.9;
x = [x(1)*(1+precoeff);x(2:end)+precoeff*x(1:end-1)];
% Plot the oscillogram
% --------------------
%
% Basically it is possible to type just plot(x) which displays the
% oscillogram of the signal. In this case the units on the x-axis
% would be samples. It is preferrable though to have the units on
% the x-axis in seconds. Therefore the plot function is called with
% two arguments:
% 1st arg: array with the sampling times
% since the sampling frequcency is 'fs', the sampling period is 1/fs.
% i.e. the time between two samples is 1/fs.
% 2nd arg: array with the corresponding function values
subplot(3,1,1);
plot([0:length(x)-1]/fs,x);
xlim([0 0.7]);
xlabel('time');
% Plot the Spectrogram
% --------------------
%
% This an easy way to perfrom a short-term analysis of a speech
% signal. The spectrogram function can be called with specgram(x).
% Then the scaling of time and frequency axes is not in Second or Hertz.
% With some more arguments the function has Seconds / Hertz as units and is
% more flexible:
% 1st arg: signal in time domain
% 2nd arg: number of fft points (can be chose same as the window-size)
% 3rd arg: sampling frequency
% 4th arg: window-size
% 5th arg: window-shift
% In this first version of the spectrogram the window-size is quite big.
% Therefore the resultion in the frequency domain is high, the resolution
% in the time domain is low however.
subplot(3,1,2);
winSize = 300;
winShift = 100;
specgram(x,winSize,fs,winSize,winShift);
% Now in the second version we don't care so much about the frequency
% resolution any more but want to have a better resolution in the time-domain
% Terefore the winow-size is chosen smaller.
subplot(3,1,3);
winSize = 100;
winShift = 50;
specgram(x,winSize,fs,winSize,winShift);
% FFT
% ---
%
% Now we will do a fft (just for one frame) without the specgram function.
figure(2);
set(gcf,'position',[150 150 500 500]) % Bigger window than default
winSize = 300;
%-- Select a time-slice for which the fft has to be plotted.
xx = x(1001:1000+winSize);
%-- Plot in the time domain
subplot(3,1,1);
plot([0:winSize-1]/fs,xx);
% Often the FFT is not performed on the signal directly but the
% signal is windowed for example with a hamming window. This is
% done to remove discontinuities at the frame border which would
% introduce higher frequency components.
subplot(3,1,2);
win = hamming(winSize)/0.54;
xx = xx .* win;
plot([0:winSize-1]/fs,xx);
subplot(3,1,3);
% Do the FFT and plot the frame in the frequency domain.
% The frequceny resolution is fs/winSize.
% This plots the frequcency from 0 up to the sampling frequency fs.
X = abs(fft(xx));
plot([0:winSize-1]*fs/winSize,X);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%5
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Demonstration code for "Independent component analysis: A Tutorial Introduction"
% JV Stone, MIT Press, September 2004.
% Copyright: 2005, JV Stone, Psychology Department, Sheffield University, Sheffield, England.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Basic Bell-Sejnowski ICA algorithm demonstrated on 2 speech signals.
% The default value of each parameter is given in [] brackets.
% [0] Set to 1 to hear signals.
listen=0; % set to 1 if have audio.
% [1] Set random number seed.
seed=9; rand('seed',seed); randn('seed',seed);
% [2] M = number of source signals and signal mixtures.
M = 2;
% [1e4] N = number of data points per signal.
N = 1e4;
% Load data, each of M=2 columns contains a different source signal.
% Each column has N rows (signal values).
% Load standard matlab sounds (from MatLab's datafun directory)
% Set variance of each source to unity.
load two; s1=x(1:N); s1=s1/std(s1);
load two; s2=x(1:N); s2=s2/std(s2);
% Combine sources into vector variable s.
s=[s1,s2];
% Make new mixing matrix.
A=randn(M,M);
% Listen to speech signals ...
% [10000] Fs Sample rate of speech.
Fs=10000;
if listen soundsc(s
,1),Fs); soundsc(s
,2),Fs);end;
% Plot histogram of each source signal -
% this approximates pdf of each source.
figure(3);hist(s
,1),50); drawnow;
figure(4);hist(s
,2),50); drawnow;
% Make M mixures x from M source signals s.
x = s*A;
% Listen to signal mixtures signals ...
if listen soundsc(x
,1),Fs); soundsc(x
,2),Fs); end;
% Initialise unmixing matrix W to identity matrix.
W = eye(M,M);
% Initialise y, the estimated source signals.
y = x*W;
% Print out initial correlations between
% each estimated source y and every source signal s.
r=corrcoef([y s]);
fprintf('Initial correlations of source and extracted signals\n');
rinitial=abs(r(M+1:2*M,1:M))
maxiter=100; % [100] Maximum number of iterations.
eta=1; % [0.25] Step size for gradient ascent.
% Make array hs to store values of function and gradient magnitude.
hs=zeros(maxiter,1);
gs=zeros(maxiter,1);
% Begin gradient ascent on h ...
for iter=1:maxiter
% Get estimated source signals, y.
y = x*W; % wt vec in col of W.
% Get estimated maximum entropy signals Y=cdf
.
Y = tanh
;
% Find value of function h.
% h = log(abs(det(W))) + sum( log(eps+1-Y
).^2) )/N;
detW = abs(det(W));
h = ( (1/N)*sum(sum(Y)) + 0.5*log(detW) );
% Find matrix of gradients @h/@W_ji ...
g = inv(W') - (2/N)*x'*Y;
% Update W to increase h ...
W = W + eta*g;
% Record h and magnitude of gradient ...
hs(iter)=h; gs(iter)=norm(g
));
end;
% Plot change in h and gradient magnitude during optimisation.
figure(3);plot(hs);title('Function values - Entropy');
xlabel('Iteration');ylabel('h(Y)');
figure(4);plot(gs);title('Magnitude of Entropy Gradient');
xlabel('Iteration');ylabel('Gradient Magnitude');
% Print out final correlations ...
r=corrcoef([y s]);
fprintf('FInal correlations between source and extracted signals ...\n');
rfinal=abs(r(M+1:2*M,1:M))
% Listen to extracted signals ...
if listen soundsc(y
,1),Fs); soundsc(y
,2),Fs);end;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
i am new in speech recognition..having problem in recognize the isolated numerical from utterance..
now, im only have code until FFT..after that i dunno how to make windowing and get input and target from the code..how to create function neural network..
please anyone who knows about this just guide me
really need ur help now...
this is my code..:
function matlab_tools()
% function matlab_tools
% =====================
%
% Some basic tools / functions from matlab which are important
% for speech processing.
% Graphical intialization
% -----------------------
figure(1);
set(gcf,'position',[150 150 500 500]) % Bigger window than default
% Read a *.wav signal
% -------------------
[x,fs] = wavread('two.wav');
% Play the signal
% ---------------
sound(x,fs);
% Preepmhasis filter
% ------------------
%
% It is common practice to use a preemphasis filter in speech recognition
% tasks. It is a simple hight pass filter. One effect is thus that it removes
% a bias from the signal.
% If you listen to the signal after the preemphasis filter, you will hear
% that it sounds differently.
% You can comment the preemphasis filter out to see if there is a difference
% in the oscillogram.
precoeff = -0.9;
x = [x(1)*(1+precoeff);x(2:end)+precoeff*x(1:end-1)];
% Plot the oscillogram
% --------------------
%
% Basically it is possible to type just plot(x) which displays the
% oscillogram of the signal. In this case the units on the x-axis
% would be samples. It is preferrable though to have the units on
% the x-axis in seconds. Therefore the plot function is called with
% two arguments:
% 1st arg: array with the sampling times
% since the sampling frequcency is 'fs', the sampling period is 1/fs.
% i.e. the time between two samples is 1/fs.
% 2nd arg: array with the corresponding function values
subplot(3,1,1);
plot([0:length(x)-1]/fs,x);
xlim([0 0.7]);
xlabel('time
% Plot the Spectrogram
% --------------------
%
% This an easy way to perfrom a short-term analysis of a speech
% signal. The spectrogram function can be called with specgram(x).
% Then the scaling of time and frequency axes is not in Second or Hertz.
% With some more arguments the function has Seconds / Hertz as units and is
% more flexible:
% 1st arg: signal in time domain
% 2nd arg: number of fft points (can be chose same as the window-size)
% 3rd arg: sampling frequency
% 4th arg: window-size
% 5th arg: window-shift
% In this first version of the spectrogram the window-size is quite big.
% Therefore the resultion in the frequency domain is high, the resolution
% in the time domain is low however.
subplot(3,1,2);
winSize = 300;
winShift = 100;
specgram(x,winSize,fs,winSize,winShift);
% Now in the second version we don't care so much about the frequency
% resolution any more but want to have a better resolution in the time-domain
% Terefore the winow-size is chosen smaller.
subplot(3,1,3);
winSize = 100;
winShift = 50;
specgram(x,winSize,fs,winSize,winShift);
% FFT
% ---
%
% Now we will do a fft (just for one frame) without the specgram function.
figure(2);
set(gcf,'position',[150 150 500 500]) % Bigger window than default
winSize = 300;
%-- Select a time-slice for which the fft has to be plotted.
xx = x(1001:1000+winSize);
%-- Plot in the time domain
subplot(3,1,1);
plot([0:winSize-1]/fs,xx);
% Often the FFT is not performed on the signal directly but the
% signal is windowed for example with a hamming window. This is
% done to remove discontinuities at the frame border which would
% introduce higher frequency components.
subplot(3,1,2);
win = hamming(winSize)/0.54;
xx = xx .* win;
plot([0:winSize-1]/fs,xx);
subplot(3,1,3);
% Do the FFT and plot the frame in the frequency domain.
% The frequceny resolution is fs/winSize.
% This plots the frequcency from 0 up to the sampling frequency fs.
X = abs(fft(xx));
plot([0:winSize-1]*fs/winSize,X);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%5
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Demonstration code for "Independent component analysis: A Tutorial Introduction"
% JV Stone, MIT Press, September 2004.
% Copyright: 2005, JV Stone, Psychology Department, Sheffield University, Sheffield, England.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Basic Bell-Sejnowski ICA algorithm demonstrated on 2 speech signals.
% The default value of each parameter is given in [] brackets.
% [0] Set to 1 to hear signals.
listen=0; % set to 1 if have audio.
% [1] Set random number seed.
seed=9; rand('seed',seed); randn('seed',seed);
% [2] M = number of source signals and signal mixtures.
M = 2;
% [1e4] N = number of data points per signal.
N = 1e4;
% Load data, each of M=2 columns contains a different source signal.
% Each column has N rows (signal values).
% Load standard matlab sounds (from MatLab's datafun directory)
% Set variance of each source to unity.
load two; s1=x(1:N); s1=s1/std(s1);
load two; s2=x(1:N); s2=s2/std(s2);
% Combine sources into vector variable s.
s=[s1,s2];
% Make new mixing matrix.
A=randn(M,M);
% Listen to speech signals ...
% [10000] Fs Sample rate of speech.
Fs=10000;
if listen soundsc(s
% Plot histogram of each source signal -
% this approximates pdf of each source.
figure(3);hist(s
figure(4);hist(s
% Make M mixures x from M source signals s.
x = s*A;
% Listen to signal mixtures signals ...
if listen soundsc(x
% Initialise unmixing matrix W to identity matrix.
W = eye(M,M);
% Initialise y, the estimated source signals.
y = x*W;
% Print out initial correlations between
% each estimated source y and every source signal s.
r=corrcoef([y s]);
fprintf('Initial correlations of source and extracted signals\n');
rinitial=abs(r(M+1:2*M,1:M))
maxiter=100; % [100] Maximum number of iterations.
eta=1; % [0.25] Step size for gradient ascent.
% Make array hs to store values of function and gradient magnitude.
hs=zeros(maxiter,1);
gs=zeros(maxiter,1);
% Begin gradient ascent on h ...
for iter=1:maxiter
% Get estimated source signals, y.
y = x*W; % wt vec in col of W.
% Get estimated maximum entropy signals Y=cdf
Y = tanh
% Find value of function h.
% h = log(abs(det(W))) + sum( log(eps+1-Y
detW = abs(det(W));
h = ( (1/N)*sum(sum(Y)) + 0.5*log(detW) );
% Find matrix of gradients @h/@W_ji ...
g = inv(W') - (2/N)*x'*Y;
% Update W to increase h ...
W = W + eta*g;
% Record h and magnitude of gradient ...
hs(iter)=h; gs(iter)=norm(g
end;
% Plot change in h and gradient magnitude during optimisation.
figure(3);plot(hs);title('Function values - Entropy');
xlabel('Iteration');ylabel('h(Y)');
figure(4);plot(gs);title('Magnitude of Entropy Gradient');
xlabel('Iteration');ylabel('Gradient Magnitude');
% Print out final correlations ...
r=corrcoef([y s]);
fprintf('FInal correlations between source and extracted signals ...\n');
rfinal=abs(r(M+1:2*M,1:M))
% Listen to extracted signals ...
if listen soundsc(y
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%