Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Need some help on voice activity detecion area

Status
Not open for further replies.

setaneiro

Newbie level 2
Joined
Jan 1, 2016
Messages
2
Helped
0
Reputation
0
Reaction score
0
Trophy points
1
Activity points
26
Im currently working on small VAD code snippet, but I got stuck on two questions.

First is entropy modulation. I've found paper with some additional informations - **broken link removed** - but I have no idea how should i understand (in paper equation from above) propability of event? I've read sample .wav file with my code, but how should I calculate entropy of this sampled signal with the length of N samples?

Second question - I've used this library, https://code.google.com/p/libmfcc/source/browse/libmfcc.c, for calculating MFCC coefficients. How can I calculate modulation energy in 4Hz frequency from this obtained coefficients?
 

It's a lot to expect that we will read an entire paper or two.

In order to distinguish a voice from other sounds:

* Examine speech in a digital sound processing program. Look closely for recognizable characteristics. Pitch range, volume range, attach/decay, etc. Devise an algorithm to identify similar characteristics.

* Your algorithm must reject waveforms which rise or fall too suddenly (eliminating hand claps, door slams, typewriters, etc).

* Your algorithm must reject waveforms which do not change much in volume (eliminating music, water running, traffic noise, etc).

* Look for spoken words to occur a certain number of times per second. Your figure of 4Hz looks okay as an average. It should have variations of course.
 

Hey, I already solved problem with modulation entropy.
For anyone intereded (for further reading):
Entropy Based Voice Activity Detection in Very Noisy Conditions, by Philippe Renevey and Andrzej Drygajlo. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.123.6098&rep=rep1&type=pdf

As my waveform signals I use wave files (*.wav), which I already can read and apply FFT on them. Thus there is no problem from beggining. I just can't exactly understand how to use MFCC coefficients after receiving them from algorithm (fe with mentioned FFT data).
Speech signal has characteristic 4Hz modulation peak. I have to check this. I can look for this modulation energy after passing data trough FFT and bandpass filter centered arund 4Hz. But the thing is, this is academic work - so I'm forced to somehow connect this with usage of MFCC coefficients. I just came with the idea, that I should follow this steps:
Steps I think I need to follow:

1. Get MFCC coefficients as amplitiudes of signals spectrum and save two or three first coefficients only (this should cover frequencies I need),
2. Apply FFT (yes, again, after DCT),
3. Check for signal energy near desired frequency bins.

But I need to be sure that this is the right approach. :)
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top