In general, geometry reduction allows faster devices and better power consumption (due to lower gate capacitance and shorter distances). This brings lower combinatorial delays.
For example:
A = B * C + D
This combinatorial function (when implemented with exactly the same logic) will take less time to complete with 40nm then with 65nm (and even lesser with 28nm).
Of course. Each combinatorial level has a delay. Whithout resulting to pipelining, you can only do so much before you get into timing violations.
So, lowering the combinatorial delay (one of the benefits of geometry reduction) is always good.
Example:
Y = ( A * B ) + ( C * D )
When done single step (without pipelining) on a 65nm device - the result may be ready after 200ns.
When done single step (without pipelining) on a 28nm device - the result may be ready after 100ns.