Another thing affecting the speed is signals vs variables.
Variables only have immediate value, and so use far less ram and need much less processing, like a variable in C
Signals not only store their current value, but values that need to be set in the future, the time they are updated, plus all of the attributes that change during runtime associated with signals ('event, 'transaction, 'active etc) and so need huge amount of processing.
Wile the UUT should follow best practice and stick to signals, where possible it can be best to use variables and other programming styles (like sparse memory modelling, linked lists, pointers etc). These take a huge processing burden off the CPU. eg, with no UUT in place, I can transfer 16MB of data into and out of a RAM model over AXI4 in seconds (10ms simulation time) in a testbench, written purely in VHDL.