Depending on the processor you are using, you may well have to set aside enough storage for each task to save the complete register set - at least of the CPU. Some devices I know of allow registers to be set but not read (e.g. some of the PIC registers associated with repeated instructions for the DSP operations). Therefore it can be quite hard to save the state of the device completely for a general case - if you know that you will never write code or use a library that uses the 'special' registers then you can leave them out of the context switch.
As for the scheduling, so you want this purely based of the timer (preemptive) or can the tasks relinquish the processor when it knows it has nothing to do (cooperative)?
Either way, you need to maintain the state of each task and then regularly check to see if it is the highest priority task at that time that can run. A common way of doing this is to have a list of tasks in the order they can be run - be that timer base, priority based or both.
Each 'heartbeat' you need to scan the list and see if the top-most task that can run is the current task or not - if not then a context switch is required and the task list re-ordered.
In the example you provided, your heartbeat would need to be every 1msec if that is the time slice resolution you want to achieve.
Personally I would investigate an existing piece of software such as RTOS (there are a number of alternatives) which are free and have the development (and general support) available without having to reinvent the wheel.
Susan