There are other fitter options, like register retiming, that alow the fitter to break your logic up and move the registers around a bit in order to minimise logic between registers. For example, a mutliply accumulator followed by a shift register would allow the tool to move one of the pipeline stages of the shift reg between the multiply and accumulate function without modifying the rtl. Its not guaranteed and you cannot control it more that "on" or "off", but it might give you an edge in some tougher spots.
If you then get really stuck, look into logic locking specific parts of your chip (ie. assinging specific entities to specific regions of a chip so that they have priority there) and ensure you have specified all false and multicycle paths in your SDC file. And as a final case, specify max delay constraints between particularly hard registers to make the fitter work extra hard on that specific path. But this really should be a last resort.
By far the easiest thing to do is change the RTL.