Need some help on voice activity detecion area

setaneiro · Jan 1, 2016

Im currently working on small VAD code snippet, but I got stuck on two questions.

First is entropy modulation. I've found paper with some additional informations - **broken link removed** - but I have no idea how should i understand (in paper equation from above) propability of event? I've read sample .wav file with my code, but how should I calculate entropy of this sampled signal with the length of N samples?

Second question - I've used this library, https://code.google.com/p/libmfcc/source/browse/libmfcc.c, for calculating MFCC coefficients. How can I calculate modulation energy in 4Hz frequency from this obtained coefficients?

BradtheRad · Jan 2, 2016

It's a lot to expect that we will read an entire paper or two.

In order to distinguish a voice from other sounds:

* Examine speech in a digital sound processing program. Look closely for recognizable characteristics. Pitch range, volume range, attach/decay, etc. Devise an algorithm to identify similar characteristics.

* Your algorithm must reject waveforms which rise or fall too suddenly (eliminating hand claps, door slams, typewriters, etc).

* Your algorithm must reject waveforms which do not change much in volume (eliminating music, water running, traffic noise, etc).

* Look for spoken words to occur a certain number of times per second. Your figure of 4Hz looks okay as an average. It should have variations of course.

setaneiro · Jan 3, 2016

Hey, I already solved problem with modulation entropy.
For anyone intereded (for further reading):
Entropy Based Voice Activity Detection in Very Noisy Conditions, by Philippe Renevey and Andrzej Drygajlo. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.123.6098&rep=rep1&type=pdf

As my waveform signals I use wave files (*.wav), which I already can read and apply FFT on them. Thus there is no problem from beggining. I just can't exactly understand how to use MFCC coefficients after receiving them from algorithm (fe with mentioned FFT data).
Speech signal has characteristic 4Hz modulation peak. I have to check this. I can look for this modulation energy after passing data trough FFT and bandpass filter centered arund 4Hz. But the thing is, this is academic work - so I'm forced to somehow connect this with usage of MFCC coefficients. I just came with the idea, that I should follow this steps:
Steps I think I need to follow:

1. Get MFCC coefficients as amplitiudes of signals spectrum and save two or three first coefficients only (this should cover frequencies I need),
2. Apply FFT (yes, again, after DCT),
3. Check for signal energy near desired frequency bins.

But I need to be sure that this is the right approach.

Welcome to EDAboard.com

Need some help on voice activity detecion area

setaneiro

Newbie level 2

BradtheRad

Super Moderator

setaneiro

Newbie level 2

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics