Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

DSP Audio - Searching for specific sample in an audio file

Status
Not open for further replies.

hopworks

Newbie level 2
Newbie level 2
Joined
Mar 7, 2010
Messages
2
Helped
0
Reputation
0
Reaction score
0
Trophy points
1,281
Location
My Bench
Visit site
Activity points
1,313
Greets all!
I figured that if what I want to do could be done, and where to look to research it, the best place to ask would be here. I searched all the tags I could think of but came up empty. Both here and via Google.

I have audio files encoded as AAC that I recorded via SDR, streaming, or just plain audio capture of my favorite morning show over the last several years. I want to process them and generate either key frames or a playlist file with indexes to moments in the files. I would prefer to do this as a background process on one of my Debian Linux rigs. All of this I can do already, but I need a way to recognize a sample in the target file.

I am not entertaining speech recognition, at least not yet, but more the recognition of a specific sample pattern, like a commercial, a song, a segway, any specific sample. I am ready to either use open source software or even hardware DSP to provide this.

I understand the implications of commercial cutting and I am not interested in altering the source file or parring it down. I just want to index the file and feed a database.

I do not want this done for me, I can write the software for the linux or embedded platforms needed. I just need to know where I might look to get started. If I need to build a large DSP or even FPGA array device to do this then so be it. I am excited at what might be possible. This is not for a commercial en-devour, but rather for my home use. Other projects that I want to expand on is identifying sounds in raw outdoor captures, like maybe the cry of an owl, the mating call of a cricket, or jet engine noise of a plane flying overhead. Sirens, dog barks, coyote calls, a bull frog after a heavy storm.

I know, heavy DSP work. The idea has been on my plate for a long time. Is it possible?

Thank you all for your valuable time!
 

It sounds as though you have worked on digitized audio.
I work chiefly with the popular free Audacity software. When I zoom in on speech, I can see it consists of short bursts of noise, with silent gaps between.

Music is generally continuous waveforms.

Speech recognition software is available. I don't know what output it sends during music, however.

You'll need to be a very clever programmer, just to recognize the difference between speech and music. Your program will need to go through millions of data points. It will need to know how the data is formatted, whether 32/24/16/8 bit, 44/41/22 kHz, etc.

To read data in mp3's, aac's, etc. I'm not sure how easy it is. These are compressed audio. They might first need to be converted into wav or aiff format. (Audacity can do this job.)

Have you encountered any programs that can tell the difference between music and speech? Just to do that much could be useful to people. Yet I have not heard of any program which does it. If it is possible, I imagine it would already be available, commercially or shareware.

As for broadcasts...

You might get somewhere with high-pass and low-pass filters. Speech is in a certain frequency range. Music contains a large range frequencies.

Your morning show probably plays a jingle as it comes back from a commercial. Detection might be possible by constructing a narrow band-pass to detect those musical pitches.

In earlier days a high-pitched beep might be transmitted just before a commercial. Perhaps this is no longer done, however. The industry is motivated to make you hear their commercials, rather than to warn you one is coming.
 

It sounds as though you have worked on digitized audio.
I work chiefly with the popular free Audacity software. When I zoom in on speech, I can see it consists of short bursts of noise, with silent gaps between.

Music is generally continuous waveforms.

Speech recognition software is available. I don't know what output it sends during music, however.

You'll need to be a very clever programmer, just to recognize the difference between speech and music. Your program will need to go through millions of data points. It will need to know how the data is formatted, whether 32/24/16/8 bit, 44/41/22 kHz, etc.

To read data in mp3's, aac's, etc. I'm not sure how easy it is. These are compressed audio. They might first need to be converted into wav or aiff format. (Audacity can do this job.)

Have you encountered any programs that can tell the difference between music and speech? Just to do that much could be useful to people. Yet I have not heard of any program which does it. If it is possible, I imagine it would already be available, commercially or shareware.

As for broadcasts...

You might get somewhere with high-pass and low-pass filters. Speech is in a certain frequency range. Music contains a large range frequencies.

Your morning show probably plays a jingle as it comes back from a commercial. Detection might be possible by constructing a narrow band-pass to detect those musical pitches.

In earlier days a high-pitched beep might be transmitted just before a commercial. Perhaps this is no longer done, however. The industry is motivated to make you hear their commercials, rather than to warn you one is coming.

I guess my goal is somewhat complicated for the home hobbyist. I want to program, train maybe, DSP's to listen to audio like the human brain does. A simple example would be to say... have an audio file of music, or any ambient audio, and have someone yell "HEY!". The human ear hears that sample and recognizes it as out-of-place and processes it as the word "HEY" and it was "SHOUTED". I guess the equivalent in video processing would be facial recognition but also looking at the body the face is attached to. If the face is detected, is the body also there proportionately to the face? Or is it disconnected? To determine if the face is a picture held up and not the original.

I totally realize that human sensory perception is a massive complex collection of algorithms, and perhaps still not ready to be duplicated with an array of DSP's and a simple microcontroller like a microchip 32MX795 or a lesser arduino pro mini. I just want to be able to pick out that audible object swimming around in a storm and recognize it and index it. Not being a science major has probably left that door locked for me.

Sorry for the long-winded reply, and I thank you for YOUR reply!
 

Status
Not open for further replies.

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top