Use resistor degeneration in the mirror to improve matching.
With this, you can make parallel and series combinations of transistors and get close results:
Example: Use a unit transistor size of W=500nm and L=5um, and a unit resistor size of 400 ohms.
For the diode connected device, place 25 unit resistors in series (10kohms), and 25 unit transistors in series (effective W/L of .5/125). For the output, place 32 unit resistors in parallel (12.5ohms) and 32 unit transistors in parallel (effective W/L of 16/5)
At 20uA, the voltage drop across the series resistors will be 200mV. At 16mA, the voltage drop across the parallel resistors will also be 200mV.
The transistor ratio would also match the 20uA to 16mA ratio. However, small changes here without the source degeneration resistors would cause significant current error. The degeneration resistors act as negative feedback to supress the variation caused by the transistors.
You could, of course, use an 800:1 ratio, but the method that I gave above will improve matching, since the critical (smallest area) "device" will have more area. Of course, the resistors need to be well matched also, which is why I suggest using many unit resistors in parallel or series, rather than just one long and one short resistor.