You could begin with simple substitution, but you run into
the problem that MOS VT varies way more than BJT Vbe,
in "make" tolerances, and about as badly with temperature.
You would need a shunt at the final devices which is not
there, probably because pumping the shunt BJTs into
saturation and staying there for free is a good thing for
ESD (spurious triggering from noise or leakage, not so much).
You know, there's about a bajillion papers on ESD supply
clamps out there, for the searching. Let alone books.
You might keep an eye out for superior triggering schemes,
because that one looks prone to be slow. Depending on
the application environment, you may want to overlay
a level triggered scheme as you show, with an edge
triggered one that keys on the unique dV/dt of ESD
events (perhaps also including a slow clamp to kill that
trigger-mode under powered operation, if you are going
to be exposed to high level fast supply transients in
use).