check to see what is available in your device. Many FPGAs now have multipliers built in. Furthermore, the build tools for FPGAs can often infer the multiply-add structures from the HDL. Make sure to read any literature about this, as there can be some nuances for high speed designs. But in general you can just write:
Yr <= Ar*Br - Ai*Bi;
Yi <= Ar*Bi + Ai*Br;
or some variation thereof. For higher speed designs, you'd want to pipeline this, eg:
Yrr <= Ar*Br;
Yri <= Ai*Bi;
Yr <= Yrr - Yri
where all of the above are assumed to be in a clocked process, A's and B's are assumed to be registered as well. The result is a system that computes 1 new output per cycle, and has a latency of 2 cycles.