That has to do with the frequency your PIC is running from and with the number of cycles it takes to run that routine. Take a look at the assembly code generated by the compiler.
U use 20MHz crystal, no pll. One program cycle takes 4 clock cycles, so your PIC executes 5.000.000 instructions per cycle (except branching, which takes two inst. cycles).
In above examle, every tcy (instr. cycle) is 200ns long, and for 20us delay, you need to wait for 100 tcy. So you create a loop which takes 100tcy to execute.