Take a careful look at the architecture. Even if the Vgs is negative or zero, there is room to have Vds of all transistors positive. The aproximation Vds>Vgs-Vth is only valid for deep strong inversion. Actually, once Vgs approaches Vth, some Vt=K.T/q are only needed as Vds to get a constant Ids. Assuming the tail current works in saturation, the size of the cascode transistors can be designed to get the differential pair into saturation and then everything works fine.