The most critical path in the adder starts at carry-in. If you can bring the carry-in to somewhere else instead of putting it in the bit[0], you can improve the timing a little bit. you can use the same technique to every 4 bits, or every 8 bits and so on..
I don't get what you mean by "dynamic domin circuit is hardware, it's not about software". The adder is hardware, isn't it ? A domino circuit is just like any other transistor level circuit. It just has a bit different concept from static circuit in the operation.