I will try to use the 64kBytes buffers and I'll give you an update of my project for the datarate I achieve to reach.
This could take a few days/weeks, and I'll have to communicate with a bigger fpga (the actual one I have in the Morph-IC-II is quite small), which I'm going to do asap.
This could go even faster I guess if I was only writing or only reading. At the moment the master of the FIFO reads 64 bytes from the receive buffer, then put 64 bytes in the transmit buffer, and so on (which adds a bit of time compared to reading or writing 64kbytes at once). The PC reads data every time the 64kbytes transmit buffer is full. This is what I need for my application.
Just to say 32MByte/s isn't obtained in the most optimized way.