My only suggestion are the large ROM array's which are most likely not being implemented in a RAM block as ROM, that could easily explain the long compile times as a ROM implemented in logic (LUTs) will take a long time for the optimization to be done.
You should use an initial block and a $readmemh to load a hex file into an actual RAM. The key here is that the ROM needs to adhere to the RAM reading template to ensure it becomes a RAM initialized with the $readmemh data file.
Looking over rest of the code I don't see anything else that would result in long compile times. The way you wrote the inferred RAM in the FIFO is suspicious as there is a specific coding template for inferring RAM and it absolutely does not include anything like address calculations as part of it. This can and may be causing synthesis of the RAM blocks to flip-flops, but even if that is occurring there are only 64-bits in the RAMs, which would only have a minor impact on the compilation run time, but can adversely affect the designs performance due to the multiplexing done to emulate a RAM.
Go on Intel's site and look for their HDL coding guidelines and read it over, that document should discuss how to infer RAM/ROM and various other primitives in HDLs.