A problem in synthesis!
i'd rather look at your timing report header, which show s which wireload mode is used while traversing the hierarchy. If the module is indeed forced to use only one wireload model, you will not be seeing wlm switching.
wire_load_mode top is correct. but if you are scripting this, consider the possbility of something is overwriting your env setup.
if you only see one wireload being used, then the next possibility is the fanout issue. generate a timing report w/ fanout and incremental timing on. look for large incr timing, and look at the driving cell and the fanout.
i can only give you my opinion, that is i like to flatten design @ a certain hierarchy for reasons you stated above. But many like to preserve hierarchy for future debugging purpose.