You use top metal (b/c of least resistance).
Let's say that the top metal is M5; you can setup the power routing such that M5, M3 and M1 are the vdd and M4, M2 are the ground going to the pad. so that they also act like a decoupling capacitor. Most people don't like to spend time doing this.
Another important consideration is how you slot the power supply lines; and there should absolutely be no 90 deg turns on the main connection going to the pads, as this would cause electron migration for higher current levels. The direction of the slotting is 'as' important as the slotting itself.
Heavy decoupling using fractals has been reported. check 'fractal capacitors' on the Xplore. The fractals can fill all available space. You can use small unit capacitors and place them manually. don't short vdd and vss!!
About the pads; you usually use the same sized pads, but separate the digital and the analog and the buffer supplies. Even so, if you require more current, then you use mulitple pads in parallel. This techniques is well-known for power amplifier layouts. A typical 70 um x 70 um pad in a submicron technology can carry 200 to 300mA quite easily but i would not push these so far without an expert doing my layout. For a digital chip, if you want to shove in 5A, you should have multiple VDDs and multiple VSSs on chip. This is the whole idea of the flip-chip package movement; localized supplies.
The power routing should follow a tree-structure, or a comb-structure or a combination thereof. Don't use a chain-structure, or else vdd drops would result.
While simulating, did you use any filters on the supply or was it just a dangling 'vdc' from the analogLib?