One thing that you could try that has worked for me in the past is to remove the gate to drain connection of M8. Then connect the output of an op-amp to the gate of M7-M8 and tie the positive node of the op-amp to the drain of M3, the negative node to the gate of M1-M3. If area is not a concern, using a resistor instead of a diode connected MOSFET for M2 might give you better voltage headroom, especially with a supply voltage of 0.45V(is this a superthreshold design or are your devices low VT?). Lastly, the gate of M9 would not be connected to node A, but the gates of M7-M8.