I had this exact idea that i've been working on for a few years now hah! As a bright-eyed college kid, I thought I was going to do it for an independent study. Still working on it, now as a DSP FPGA engineer (and an audio geek
). The fast convolution algorithm isn't THAT hard and latency specs are fairly easily met on an FPGA. But there's a couple of big obstacles that need to be addressed with doing this on FPGA
Memory: Think about how much memory you will need. Say, for proof of concept, you only want to store one IR and you want to support IR's up to 1 second (i've seen audio IRs at up to several seconds in length). Say you're sampling at 48k and 16 bit (fairly modest data rates). You need 48,000 samples/sec * 1 sec * 16 bits per sample = ~768Kb of memory. Most entry/mid level FPGAs don't have NEAR that in BRAM, not even considering the fact that most BRAMs aren't 16 bits wide, so you'll be wasting a lot of that anyway. This also doesn't factor in a number of data buffers you'll need. What's the solution? Either use really really short IRs, use poor resolution data, or use DDR or some other external memory. Interfacing to DDR is NOT trivial, even with tools like MIG (sorry, i'm a xilinx man, don't know what the altera equivalent is). Not to mention the fact that I don't think the DE2 has on-board DDR anyway, best go with a Spartan 3E board if you want to go this route.
I/O: For me, this is another big challenge. How are you going to get audio to the FPGA? This won't be such a big challenge if you chose a reasonable protocol (for me, I'm trying to use a fairly obscure protocol with 8 ch. of 24-bit 48K audio data, which means no IP available, so I have been working on rolling my own core to do this). But if you're just doing one channel and you use the on-board CODEC (which probably has SPI or I2S or something), you can probably find some code.
If you REALLY want to get this going, I think you will want to use a co-processor with the FPGA. Something that will make memory interfacing much easier and will likely have A/D and D/A cores which are drag-and-drop. I can't speak to Altera solution, but using Microblaze would mostly eliminate the DDR and i/o interfacing issues, but there would still be a number of other things to worry about.
In my opinion, a project like this will require heavy background in digital design, DSP, and embedded systems if you go the co-processor route. I can't speak to your personal situation, but most students don't have the background to pull this off by themselves as a senior project, unless the scope is handled very carefully.
Anyway, I hope my ramblings have helped give some more context to the project. I'd be interested to see how this works out for you, keep us posted! I always enjoy talking about audio DSP
---------- Post added at 19:51 ---------- Previous post was at 19:50 ----------
Oh and i've done quite a bit of research on this. I could point you to some good papers if you're interested.
---------- Post added at 20:03 ---------- Previous post was at 19:51 ----------
This would be like finding a way to take a noisy recording of a lecture, and turn it into a clean recording.
That usually involves adaptive filtering.
Recently a poster here wanted to figure out a way to extract vocals from an audio recording. He planned to use DSP and whatever other method might work.
The DSP programs I've used allow me to see a spectral graph of the audio. It's obvious to see the tiers of overtones which occur from a played note (whether it comes from an instrument or a vocal).
It occurs to me that if we could draw a rectangle around a single overtone, or group of overtones, we might extract a single word. Extend that to all the lyrics perhaps. It would take sophisticated FFT math to do it.
The problem with doing that is that noise is not usually strictly additive, it's more complicated than that. Usually it is non-stationary meaning typical LTI approaches are invalid. AND speech signals are more spread spectrally, it's not just a signal and a few harmonics. There are some tricks you can play with stereo audio because of the way that the lead vocals are nearly always panned dead center and not much else is panned that way.
This has been a problem in speech processing for years, it's not a simple solution like that. What you have described is simple filtering, which works for some types of noise, but not for what you're talking about.