Start by realizing that anything bigger than an 8-bit uC
is beyond an individual's efforts, let alone an individual
poking at it in their spare time. Bigger chips are team
efforts, because.
Then pick an architecture and maybe an example with
good literature, and attack the blocks and top level.
Old AMD bit-slice chips could be an interesting waypoint.
The 2900 series had a lot of functions (my first job out
of school was replicating several of them; and we had a
small team to get them all done (enough to make a
MIL-1750 flight computer) in a couple of years). You
could get a hardware stab done with PCB design and
the chip set (if they are still available, or newer
incarnations) and then work on porting that, now
proven (if) to a single chip implementation - might
even find IP blocks corresponding, so you could spend
time on architecture and optimization rather than
(say) being the umpty-umpth guy coding up the same
microprogram control unit.