I think the first large scale 'microcontroller' with virtual memory was the 80186/80286 which although not advertized, worked in real or virtual mode by simply locking the values in the segment registers together (real mode) or allowing them to hold different values (protected or virtual mode). They did it on a purely silicon level, primarily to allow programs to 'think' they used the same memory addresses while in reality each was in a different region of physical memory.
The Bill Gates story dates back to the early 80's and I think it's true. The first IBM PC's I worked on only had 64K of RAM and a cassette tape for mass storage!
Brian.