Some time ago a colleague of mine asked me the same question. I've discussed and analyzed that scenario.
1) Usually the accuracy of the Cal Kit set the overall measurement accuracy, good Cal Kit=good measuremets, homemade CalKit=higher uncertainty
2) The most important causes of uncertainty for all spar are the discontinuities.
For Reflection measurements (s11, s22) also Reactive parasitics of the load, short , open plays an important role.
3) Of course, You may build your own Cal Kit but You'll take in mind that the accuracy will drop a lot.
More specificatelly, s21, s12 traces will be affected by a ripple.
While for s11 and s22 the accuracy will be degaded by at least two effect: a ripple (visible effect) and a reduction of the measurable range (non visible effect), in other worlds , if you calibrate with a 3.5mm CalKit, you may read s11 form 0 to -40 dB , but if you calibrate with your own Cal Kit, perhaps, you may read valid s11 form 0 to -20dB . The VNA always will show a dinamic of 60 dB but you cannot believe it.
4) how to practically build the LOAD and SHORT
Build the 3 standards. Do not recycle the same line by soldering (and de-soldering) the resistors. Use as low as possible solder alloy. You should perform exellent solderings.
Mount the two 100 Ω resistors upside down. I mean the carbon layer should be mounted to be considered as a small lenght of a 50 Ohm line.
Ground: Start from the consideration that the gnd is never good enough, if you use PCB, lots of via hole and stay closest to coax connector.