Re: Clock Gating
Clock gating saves power in both the clock tree buffers (which stop switching), and in the clock tree leaf cells (Flipflops).
Backend tools handle clock gating automatically during and after CTS (CTS being CG aware, and clock gating timing checks being applied after CTS), and you can declone and clone clock gates to optimise them in the Physical Implementation stages if required.
There can be issues with the relative timing of the enable signal at the clock gate with respect to the clock, which you will only see this after CTS. These occur if the clock gates are a long way up the clock tree, if there is large skew (between the enable signal source ff and the gated ff), or if there is a lot of logic leading to the enable signal. This can be dealt with by standard timing optimisation after CTS, or by cloning clock gates before CTS. Alternatively once you know the size of the violations you can push this information back into synthesis, using set_clock_gating_check to model this, so that the enable path optimisation is done in synthesis for the next (Synthesis/PnR) itteration.
You also need to be aware of the effect of the clock gating on the reset process.
If you are using asynchronous resets then there will be no issue with resetting the design.
If you are using synchronous resets, then you can run into problems with resetting the design, since the reset will only have an effect if the clock is reaching the flip-flops. In this case you would need to ensure that the enable conditions of the clock gates are enabled by the reset process (to pass the clock), and that you ensure that the reset is held for enough clock cycles to guarantee that all of the clock gating cells pass the clock, espcially if you have multiple stages of clock gating.
Probably easier to use asynchronous resets when using clock gating.
Ultimately you will need to verify the reset process with gate level simulation.