Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Most appropriate signal processing algorithm for this problem

Status
Not open for further replies.

greenbee

Newbie level 3
Newbie level 3
Joined
Feb 12, 2009
Messages
3
Helped
0
Reputation
0
Reaction score
0
Trophy points
1,281
Visit site
Activity points
1,312
I'm looking to detect the presence of a person who irregularly repeats one specific word (e.g. 'rhinoceros' ..... 'rhinoceros' .. 'rhinoceros' ...............'rhinoceros') in a room with other people talking. If the person is not present, that word will not be used at all.
Also, the amplitude of his voice may not be constant as he could be wandering around the room.

What's the best way to detect the presence or absence of this word?

One solution I've thought of would be to process a 1 second sliding window of the incoming sound data.
I would high pass filter this and then look for zero crossings to get a digital feature vector (to get around the varying amplitude issue), and then perform cross correlation with a similarly processed recording of 'rhinoceros'. Correlation above some arbitrary threshold would indicate the presence of the word.

This doesn't seem very elegant to me though as I'm throwing away all the amplitude information.

Anyone have a better idea? I feel like there should be existing algorithms that solve exactly this kind of problem but I'm not well informed enough to know what they are.

Thanks in advance for your thoughts!
 

I'm no expert on audio algorithms, but I'll give it a shot.

First of all, why are you worried about the varying amplitude issue? If you perform a cross-correlation across the data, you should be able to find a relative peak if someone says 'rhinoceros,' regardless of the amplitude; if the voice is loud, the peak will be larger, but I would think the peak would still exist if the voice was just multiplied by some scalar. Are you assuming he is moving while he is saying the word?

If he is moving while saying the word, here is one idea: you can break the word down into syllables and use one correlator per syllable. You would also need to make sure the syllables are adjacent in your detector, but this way if the amplitude is varying across the word, you are now assuming the amplitude is the same for that syllable.
 

Thanks for your reply!

I'm assuming he moves slowly relative to the word, so that there is no variation in amplitude while he says one word.

I was worried about the varying amplitude because its amplitude relative to the background noise is changing, so after cross-correlation I would have to have a pretty low threshold on the result in order to deal with the quietest case. But I think you are right - I was also imagining a signal with a dc component for some reason (duh!) which is why I mentioned the high pass filter. Maybe cross-correlation would do the job.

Do you have any idea how you might frame a Kalman filter (or whether it is appropriate for this) to tackle this problem? I also found this link, which has a load of other useful suggestions: https://stackoverflow.com/questions...use-for-signal-sound-one-class-classification

Thanks again!
 

I do not think a Kalman filter would be appropriate for this problem; I have not worked with them much, but doing a quick reading on them says that they are mostly used for updating an estimate over time, which isn't really your goal here. Offhand, I think a cross-correlator with a threshold would be your best bet. However, that link you provided has some good suggestions as well that are worth trying out; you can use the cross-correlation method as your baseline and see if anything suggested performs better. I think the syllable approach may work better if implemented properly under the right conditions (for example you may have a stronger peak at the "no" syllable of "rhinoceros," and you could check for weaker peaks around that syllable to see if the word is detected), but I cannot say for sure.
 

Thanks again for your reply. I agree that a Kalman filter doesn't sound right - I was wondering whether it's possible to recast the problem somehow. I'll try the cross-correlation method at the first instance and see how I get on! Are you a sigproc professional?
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top