What you have in the second always block is a chain of registers. So each stage completed in a single clock cycle, and the pipeline length is 5.
To complete the lot in a single pipeline state, you need to use non-blocking assignments with the = rather than <=.
But, even though it may complete in a single clock cycle, the max clock speed will be lower. So even though the latency is lower with non-blocking, as the code you already have can be clocked higher the throughput may be higher with the higher latency. (and it may be significantly higher, like 2 or 3x or more).