Has anyone implemented FDTD algorigthm in VHDL ? Downloading the code to an FPGA and executing the algorithm in hardware should significantly speed up the execution times for large problems. Probably usage of dozens of (cheap) FPGAs for parallelizing the code should give us significant gains. Let me know
The graphic card GPU is a vector processor with a number of MAC units so it can be used to help the main CPU for computation. Modern GPUs allow user access it using APIs. So, we can use it.