In my "old school" experience most of the
characterization is done in simulation, with
some test silicon to validate the active device
and wireload models. It is impractical to test
every gate in every fanout and line-length
combination across process, temperature
and supply (J*K*L*M*N*....).
Internal ("core") gates lack the drive strength
to be measurable cleanly pad-to-pad.
Silicon validation articles might include
ring oscillators built of a few different
inverter and combinational gates at a
few different loadings. Enough angles that
you can regress the various delay-terms
across the "make" and environmental
variations.
How long? Who knows? That comes down to
how fancy you feel the need to be, manpower,
how much to measure, how fine to fit or re-fit
models. I've seen weeks at low expectations,
low effort up to man-years (divided by applied
manpower, for schedule) for something
deemed critical with a lot of "management
help" (in defining what other people should do).
If you were dealing with an IP house that does
this for a living, they might have a defined flow
and be able to commit to a cost and schedule.
For a one-off, call it a death march that you just
have to get through regardless and sandbag
the schedule, cost and quality so you have a
chance of not being totally wrong.