Why do I need multi stage clock gating? I can insert a root ICG at clock source and that should cut off my clock. What is the need to have downstream clock gates ?
Root ICG cut off clock for the whole design. But, you can want to cut off clock just for part of design. During the real operation of your design some flops may not switching, so it is reasonable to stop clock propagation to these flops.
Why do I need multi stage clock gating? I can insert a root ICG at clock source and that should cut off my clock. What is the need to have downstream clock gates ?
synthesize any relatively complex RTL with automatic clock gating on. You will see the tools will infer many enable conditions for different flip-flops. A single clock gating cell would not be enough to achieve this functionality.