You speak of reducing "the" channel length, but this is not
something you should treat as a "single setting". Different
functions may likely want different geometries.
For a load you might prefer a longer channel to raise Rout
up to where it doesn't "contribute" to gain-node impedance
(parallel diff pair Rout, load Rout, next stage Rin) as its gm
is a don't-care.
In the diff pair, you are in a foot-race between increasing
gm and decreasing Rout, for gain as you reduce L. The
BW will benefit, gain may or may not.
For high BW you may have to step away from low power
DC amplifier topologies, more gain stages at lower gain
apiece. This will be harder to compensate and despite
higher bandwidth, settling time might be same or worse.
OTAs are inherently easy to compensate (shunt C at
output) but not great performers especially as large
hold capacitor drivers. If Chold >> Ccomp_min, that's a
soggy amplifier (but dead stable). You might like to run
the amplifier "so hot that Chold == Ccomp" for whatever
Chold needs to be, for the sample to stay right across
the conversion cycle. Might need to work backwards from
that (hold attributes) to amplifier particulars?
Here's a basic question. Are you really needing an amp,
or would a simple sampling switch do the job? That all
comes down to the signal source impedance coming in,
and what the driven load needs for input. If it's a
50-ohm system then maybe you just switch, to sample,
and let (Rsrc+Rsw)*Chold define the settling time (how
many tau, to what number of bits...). This is system
specific, a "general purpose" architecture might need
to accommodate a range of source impedances. But
beware the engineer's impulse to make a general
solution to a specific problem, and if it's a specific
application and only that, look for simple solutions
before you make a complex project out of it?