There are reasons why Class A yields the highest linearity.
Class A current is where fT curve is flat, respectively has its peak.
If in datasheets gain would be plotted linearly rather than on a logarithmic scale one would see that gain peaks where fT peaks. So gain is also flat at Class A current.
Slope and curvature of gain have strong impact on linearity.
Imagine like RF voltage and current circle around operation point along the dynamic loadline. Means that during an RF cycle voltage and current vary.
But, gain depends on voltage and current. Means that gain changes during an RF cycle. Means that a sinusidual input signal is not amplified constantly during an RF cycle. Means distortion.
This is why in Class A operation the flat gain helps to keep distortion low.
Look at a datasheet and compare IP3-, gain- and fT-curves over current and voltage, e.g.
Infineon Technologies
Note that this explanation is independent on the technology you use, BJT, GaAs, ...