I would be looking at folded cascode and using HV devices
(like LDMOS) in the diff pair and the cascode guards. That
will let you "pin" most of the critical voltages that have to
slide across supply-driven and common-mode-driven ranges.
But you need to be wary of input differential voltage max
specs (which might force you to use thick gate MOS devices
with all of the performance and reliability (mV Vio drift)
negatives that go with them. Of course there's "thick" and
there's "thick" - >1000A on 40V range MOSFETs while a
40V asymmetric LDMOS can be had from a 5V, ~100A gate
ox - are you riding a pony, or an ox?
I mean, are you using a flow that only has thick gate plain
MOS (ox), or are you using drain-extended / LDMOS with a
thinner gate ox (pony)? Each has their uses, and demands.