The best bet would be 4x loading. 3x loading is what some text say based off some equation solving for the optimal loading of cmos cells. The answer turns out to be ~2.8 but this is without including internal parasitic loading. 4x turns out to be closer to the optimal.
Added after 11 minutes:
I forgot about the other questions.
For most cells the L are set by the process. (130nm,90nm,65nm, etc) The W is what you use for drive strength. The higher the W the more drive where drive would be output load cap / input cap when the P mos W is 2x the N mos W. If the P mos W is not 2x the N mos then some other calculations must be done to determine the effect drive for the P and for the N.
For standard cells, you are interested in delay (input 50% crossing to output 50% crossing), Output slew (usually 20% to 80%), power consumption per transistion (This is where you don't want too much crow bar current because it is wastefull), static leakage, possible decap and esr information for the cell if you will be doing a power analysis.
For sequential elements you will need setup and hold and possibly minimum pulse width if it is a dynamic flop.
IO cells are more complicated and I suggest looking at the spec for that perticular IO cell you want to characterize. The spec will give details on how to get the information and how to set up the spice to get it.