HSIM from NASSDA is a good, unfortunately pretty expensive tool.
You may simulate 40-60M transistors, it supports DSPF and the digital vectors from your gate simulation may be used as well. It is pretty fast
especially when you simulate memories. OK you have to play with
.params and you need a large memory for large circuits, but it is an
ultimate solution if you want to get also leakage in DSM.
Powermill and Pathmill are obsoleted.
Cheap gate level estimation may be done as follows. If you know the capacitance for each internal node, if you sample the toggle statistic
(e.g. no problem with Modelsim), you just need to calculate the power
dissipation on those capacitances and summ them together. If you want
to be more precise, you can have to precalculate an equivalent capacitance for each input of each SC in your library and add it to the above mentioned node capacitances. I was suprized about a good correlation of this method.
Nowdays there are 3D table chracterized libraries, expensive tools
but the precission ratio improvement is not so good as one may expect.