The technology node is 28nm bulk cmos, load capacitance around 1-1.5pF and power consumption doesnt matter atleast for now. But it would be better if it is low. For a 1V power supply the 3-dB BW must be around 40MHz.
You didn't mentioned nothing about gain, but if GBW is ~400MHz and 3dB BW is 40MHz I assume that a gain constraint isn't high.
The single stage OTA loaded by 1.5pF needs an input transistors with transconductance at least 4mS to meet GBW requirement. You are able to achieve it with 700µA of tail current.
Try to start with the simplest 5 transistor OTA to check GBW and next develop it into folded cascode (for telescopic probably isn't so much headroom if your process doesn't provide low-Vth devices).
With this constraint (I_tot<150µA, GBW>400MHz) the only possibility is to base this ota on some variation of improved recycled cascode architecture or by applying to standard cascode a very good matched feedforward compensation scheme which completely cancel pole caused by output capacitance.