One thing you could do for sure is optimize the code.
For example, the 'mask' function has a variable execution time and uses a lot of instructions to return a value.
In order to test if num is 9, it needs to go trough the whole process of checking whether it's 0, 1, 2, 3,....8 and then finally 9. Since it takes 3-4 instructions to test a number (load literal, subtract literal from number, check if zero flag is set), it means that the routine will take some anywhere between 5 and 40 instructions to return a mask value. In terms of microcontrollers, this is incredibly slow
There's a better way to do the same thing.
For example, let's assume that general purpose registers from 0x30 to 0x39 are free (you choose a range that is unused by your code) and fill it with the mask values corresponding to "num" values 0 - 9 upon startup (one time)
Then use the indirect addressing function of a PIC microcontroller (address in FSR lets you read contents of the addressed register from INDF) and do the following:
1) Add 0x30 to the 'num' value to get the address of the appropriate mask register
2) Put the address in the FSR register
3) Read the mask value from INDF register
4) Return from subroutine.
This will do the exact same thing as your switch routine, but will ALWAYS take only 3 instructions (not counting the call and return instructions) regardless of whether 'num' is 1 or 9. This pretty simple change in code reduced your execution time five times.
EDIT: If you're low on available general purpose registers, you could use EEPROM memory to permanently store the mask values and read them from there. Reading from EEPROM is just as fast as from a GPR, only writing takes a long time.