If I understand your question correctly, yes, it gives nearly the same performance (in terms of area and delay) across cell libraries with approx similar timings.
But the point is more like that with DW you can choose a good tradeoff of timing vs area, automatically for a standard arithmetic or such function . Like if your timing requirements are not that much, it can choose, say, a mutlipler of less area, and for high frequencies, it can give a big, but fast multipier.
If you have a better idea to build a better-topology multiplier, priority encoder etc etc then you need not choose DW. but like getting a 40x40 multiplier to run at 333 MHz for 0.13u technology well within reach of DW
-b