University Project involving audio dsp

juggler · Oct 23, 2011

Hi everyone,
I'm trying to think of a project for my final year electronic engineering degree.

I'd like it to involve audio processing using DSP and it has to run on an FPGA (Altera cyclone 2 DE2 board).

I was thinking of a convolution IR reverb then (if all goes well) expanding/ adapting it. The problem is that this is nothing new! the project, ideally, needs to be based on a problem or a question for example 'can I implement a useable real-time convolution reverb on a standalone FPGA?' and maybe compare performance with a vst (possibly using the same code running on a PC?) - How can i scientifically compare two reverbs in terms of performance/quality? apart from measuring latency against bit-rates.

I really like the idea of using Matlab to create and simulate my code then load it onto the FPGA but i need an area to investigate, something to give me results to present and discuss.

Basically, i'm looking for any kind of input from you lot -comments, ideas, suggestions, things to investigate, a path to go down.

There's so much work being done in this area at the moment. Its a subject i'm just starting to get into, its just the case of applying what theory i already know from uni to this area.

Has anyone done a final year project?

Thanks in advance

BradtheRad · Oct 23, 2011

1.

As long as you mention reverb, what would be great would be to EXTRACT REVERB from a recording of a lecture where the microphone was far from the one talking. Such recordings are usually barely intelligible.

2.

Recently a poster here wanted to figure out a way to extract vocals from an audio recording. He planned to use DSP and whatever other method might work.

The DSP programs I've used allow me to see a spectral graph of the audio. It's obvious to see the tiers of overtones which occur from a played note (whether it comes from an instrument or a vocal).

It occurs to me that if we could draw a rectangle around a single overtone, or group of overtones, we might extract a single word. Extend that to all the lyrics perhaps. It would take sophisticated FFT math to do it.

Then if it could be automated, then you might invent a new vocals-extractor.

This would be like finding a way to take a noisy recording of a lecture, and turn it into a clean recording.

guitarguy12387 · Oct 27, 2011

I had this exact idea that i've been working on for a few years now hah! As a bright-eyed college kid, I thought I was going to do it for an independent study. Still working on it, now as a DSP FPGA engineer (and an audio geek ). The fast convolution algorithm isn't THAT hard and latency specs are fairly easily met on an FPGA. But there's a couple of big obstacles that need to be addressed with doing this on FPGA

Memory: Think about how much memory you will need. Say, for proof of concept, you only want to store one IR and you want to support IR's up to 1 second (i've seen audio IRs at up to several seconds in length). Say you're sampling at 48k and 16 bit (fairly modest data rates). You need 48,000 samples/sec * 1 sec * 16 bits per sample = ~768Kb of memory. Most entry/mid level FPGAs don't have NEAR that in BRAM, not even considering the fact that most BRAMs aren't 16 bits wide, so you'll be wasting a lot of that anyway. This also doesn't factor in a number of data buffers you'll need. What's the solution? Either use really really short IRs, use poor resolution data, or use DDR or some other external memory. Interfacing to DDR is NOT trivial, even with tools like MIG (sorry, i'm a xilinx man, don't know what the altera equivalent is). Not to mention the fact that I don't think the DE2 has on-board DDR anyway, best go with a Spartan 3E board if you want to go this route.

I/O: For me, this is another big challenge. How are you going to get audio to the FPGA? This won't be such a big challenge if you chose a reasonable protocol (for me, I'm trying to use a fairly obscure protocol with 8 ch. of 24-bit 48K audio data, which means no IP available, so I have been working on rolling my own core to do this). But if you're just doing one channel and you use the on-board CODEC (which probably has SPI or I2S or something), you can probably find some code.

If you REALLY want to get this going, I think you will want to use a co-processor with the FPGA. Something that will make memory interfacing much easier and will likely have A/D and D/A cores which are drag-and-drop. I can't speak to Altera solution, but using Microblaze would mostly eliminate the DDR and i/o interfacing issues, but there would still be a number of other things to worry about.

In my opinion, a project like this will require heavy background in digital design, DSP, and embedded systems if you go the co-processor route. I can't speak to your personal situation, but most students don't have the background to pull this off by themselves as a senior project, unless the scope is handled very carefully.

Anyway, I hope my ramblings have helped give some more context to the project. I'd be interested to see how this works out for you, keep us posted! I always enjoy talking about audio DSP

---------- Post added at 19:51 ---------- Previous post was at 19:50 ----------

Oh and i've done quite a bit of research on this. I could point you to some good papers if you're interested.

---------- Post added at 20:03 ---------- Previous post was at 19:51 ----------

This would be like finding a way to take a noisy recording of a lecture, and turn it into a clean recording.

That usually involves adaptive filtering.

Recently a poster here wanted to figure out a way to extract vocals from an audio recording. He planned to use DSP and whatever other method might work.

The DSP programs I've used allow me to see a spectral graph of the audio. It's obvious to see the tiers of overtones which occur from a played note (whether it comes from an instrument or a vocal).

It occurs to me that if we could draw a rectangle around a single overtone, or group of overtones, we might extract a single word. Extend that to all the lyrics perhaps. It would take sophisticated FFT math to do it.

The problem with doing that is that noise is not usually strictly additive, it's more complicated than that. Usually it is non-stationary meaning typical LTI approaches are invalid. AND speech signals are more spread spectrally, it's not just a signal and a few harmonics. There are some tricks you can play with stereo audio because of the way that the lead vocals are nearly always panned dead center and not much else is panned that way.

This has been a problem in speech processing for years, it's not a simple solution like that. What you have described is simple filtering, which works for some types of noise, but not for what you're talking about.

BradtheRad · Oct 28, 2011

Yes I figured it would be a terribly involved process, even to do a simple thing such as isolating a steady note sung by Caruso, in a noisy concert hall, for just a second or two. The result would be only the voice, and with all the overtones, so that everything contained in the famous singer's voice would be caught.

To do that with an entire song, it would require that someone sit and look through it, each hundredth of a second, and choose which pitch to filter out, and what overtones to filter out, to extract an entire vocal part (or instrument part) from the song.

It could be made easier by programming a computer to recognize a pattern of overtones as a human voice, or a trumpet, etc. And then extract entire notes automatically, one after another.

As for reverb, it's certainly amorphous and hard to pinpoint. But when playing a recording backwards it is evident. Maybe that's how to approach it. Backwards? Just playing with words here.

University Project involving audio dsp

juggler

Junior Member level 2

BradtheRad

Super Moderator

guitarguy12387

Member level 5

BradtheRad

Super Moderator

Similar threads

University Project involving audio dsp

juggler

Junior Member level 2

BradtheRad

Super Moderator

guitarguy12387

Member level 5

BradtheRad

Super Moderator

Similar threads

Privacy & Transparency

Privacy & Transparency