Well, i agree that a transformer is required.
Of course, to be accurate, the Available gain of the transformer should be known and taken in account.
But, i woul'd move your attention to the effect that DUT cause when it will be placed in front of a system.
The overall NF, using the Agilent 57-1 nomenclature, is F12=F1+(F2-1)/G1, where G1 is the available gain and F2 is the noise figure of 2nd stage when the 2nd stage see Zo (i.e. 50 Ohm) at the input.
Now, if the Z is very far from 50 Ohm the actual value of G1av anf F2 are very far from the measured.
You should expect that the actual F12 is far (read higher) than calculated F12