Equations only tell you, what their author told them.
Equations are always (or at best) simplifications of
reality.
When most people talk about "shorter gate" being
"faster" they often talk at "high level" comparing a
(say) 90nm transistor to a 250nm transistor. The
channel gm is higher, true. But "speed" has other
elements such as the drawn drain area (Cdb) and
spacer geometry / contact-to-poly spacing (Cdg,
the big deal for FET fT/fmax along with Rg, and you
cannot leave Rg out as it's another highly groundrule
and process-details dependent factor.
In an uncompensated OTA, maybe the other transistors
are not small-signal-significant. However they may have
influence on large signal transient attributes; front end
current steering per delta-V (diff pair gm) is limited at
very small signal and limits bandwidth at "nil plus a nit"
overdrive.