I think for bipolar input pair you would always need a resistive degeneration at the emitter to improve linearity. I am not sure, but I think linearity in this cae has nothing to do with the area. I understand this as follows,; for a bipolar pair
I1-I2=Io*tanh(Vi/(2*Vt))...regardless of sizing.
If headroom is a concern, you can use two tail current sources (one for each transistor) and have the degenration resistance connected between the two emitters.
You can also improve linearity by the so called schmook's technique (Refer to "RF microelectronics" Behzad Razavi, the chapter on mixers)