I think this is heavily trodden ground, everybody wants
a simulator that can grind large netlists faster. And that
is the only difference between schematic and post-layout
simulation - netlist node- and element-bloat.
Now depending on the PDK's extract implementation
there can be things you do to the netlist that result in
a much better runtime (and lack of crashing). In particular
if your setup extracts each MOSFET finger as a separate
device, that's heinous. At one place I worked we tasked
one of the sysadmins to write a (PERL, I think) script
that scanned netlists for MOSFETs that had identical
D, G, S, B connectivity, counted them, and replaced
them all with one m=N device statement (commenting
the rest for traceability). Huge improvement, due to
de-bloating. I'm sure the same could apply to other
elements but MOSFETs are most commonly multi-
fingered and lazily extracted.
You're unlikely to find a paper to review, it was just
a bit of useful work done by someone who got no
glory.
But maybe this is the germ of an idea that will get
you past your academic gauntlet.