First, I should mention that my oversampling description is basically what a UART does (the "Receiver" section of WikiPedia's "UART" page gives a decent intro on this).
Anyway, it's hard to comment better on this without knowing more, e.g. whether or not the endpoints must still run @ 4MHz in-between communication periods (maybe not based on your response), and the 4MHz endpoint clock accuracy (both absolute and relative to each other), not to mention other things (cost & power!).
But I'm not saying I want to know more ;-)
Your preamble comment brings to mind a DLL solution, since that could be designed to "slap into lock" in a minimum of edges, but would then free-run and get into drift issues between communication periods.
(a PLL could drift too far, making the preamble be quite long to re-acquire lock, depending on the delay between communication periods and the "tuning" of the PLL)
(plus, typical PLL macros might not handle a loss-of-reference-clock situation where it suppresses the PD until the clock reappears - might need to be a custom PLL)
I reluctantly mention that if your 4MHz reqs were very loose, maybe a RC-oscillator could be used but would need some form of per-lot calibration, and the analog design could be tough per V & T variation/drift rejection.
(I've seen this done, but the process allowed for on-chip poly-fuses or maybe was EEPROM)
I also worked on a design where the endpoint had an RC-oscillator that ran at a large-ish multiple of the desired freq, and then a central controller would "calibrate" counters on all endpoints using an accurately timed transmit sequence; the resulting counter-values would then be divided (by the same amount in all endpoints) and used to modulo-count accurate intervals (i.e. output-clock edges) that are a fraction of that calibration interval.
Each endpoint thus can generate accurate intervals (both absolute and relative to each other), even though each has an RC-oscillator running at a different freq.
If this calibration always occurs shortly before the output freq is needed, then oscillator drift can be negligible.
However, for that design, the minimum output interval was much, much longer (milliseconds!) than that of a 4MHz clock, so I doubt this approach makes any sense (you'd need to run your oscillator at too high of a multiple of 4MHz).
Hope some of this helps and not just distracts ... good luck!