The documents j33pn has posted are great and will answer your question.
Another way to think about the problem is to put yourself in the ASIC designers position who want to add a UART to a MCU. Using silicon gates you would create an interface to latch the data in from the data bus, a FIFO to store the data, a address decoder so you know when the function is being addressed, and a couple of counters and dividers that can be configured by writing values to their registers to scale the processor's clock to correctly clock data out of the FIFO at the desired baud rate. You would want to use a few gates to output status bits (flags) when something is happening like the FIFO is full or data is actively being clocked out. Once you have the silicon chip designed you would need to develop the libraries so the complier can convert someone's software into register values that get written where you need them, set enable bits, load the FIFO, etc.
If you think about how you would make a UART with parts like FIFOs, latches, and other logic you will have a better feeling of what is in the MCU that performs this task.
Does this make sense?
-H