gm is the slope d(Id)/d(Vgs). Look at the basic ID-VG
curve and you can see that there's a slope that's
maximum and consistent, in the subthreshold region
(thus maximizing gm) and it rolls over as the FET goes
into strong saturation.
This has not to do with short channel per se.
Short channel brings you some non-classical device
features (like halo implants and strong LDDs), which add
source resistance and may lower the point where dId/dVgs
rolls over - source degeneration matters when Id*Rs
becomes a significant fraction of VT0.
DIBL will move the gm peak depending on drain voltage,
more than long channel devices (your peak gm does not
stay put with common mode position varying).