Re: microblaze
That will depend on what FPGA you will use.
I haven't tried the Microblaze IP, and don't know if it can perform floating point operations.
If you want your algorithm to perform the fastest, then you should try to implement it directly in hardware, implying that you have to define your own architecture.
Also, you will need to implement floating point logic, which I don't think is as easy to implement as fixed point.
I think that Microblaze can help you, but that will depend on the clock frequency that it runs.
An example:
I once implemeted a 1024-point DCT in a Spartan-3 FPGA with a 50 MHz clock, and itc omputed the whole DCT in 400 microseconds (with fixed point)
I also implemented the algorithm in Matlab (floating point double precision), and it computed the algorithm in approx 1 milisecond. Now, the PC I used had a Pentium 4 processor running at 2.1 GHz. Of course you have to consider that RAM access in the PC is very slow, but still it had a 266 MHz bus, faster than the FPGA clock.