They are deviations from the simple ("first order") MOS
device models, which originally used many assumptions
to make analysis tractable - such as long channel,
Vbs=0, and so on.
You can approximate a set of points of the characteristic curve of a device by fitting a polynomial.
If the poly is of order one it is a linear approximation. It is called a first order approximation. If the poly is order two it is a second order approximation. And so on. The larger the order the better the approximation to the curve.
Simple device models do not take into account subtle effects so they approximations is not as good as it can be. Including more parameters into the model gives a better fit to the real curve of the device. So they are going to higher order approximations.