Your UUT code isn't synthesizable, a for loop can't be used to do division doesn't matter is it's VHDL or Verilog. To generate hardware the loop is unrolled, which means only the assignment when i and j are both 63 is done, hence it probably never changes. In the previous for loop in your tb I failed to account for your time control wait statement inside the loop, which results in software like behavior.
Now that I've looked at your UUT code, I think you need to rewrite a lot of it to make it synthesizable. For loops are used for replicating logic not for sequencing through an array. As you arent treating the array a memory the tools will try and implement then as FFs, as each array requires 32K+ FFs the dessign is not likely to fit.
Now that I've looked at your UUT code, I think you need to rewrite a lot of it to make it synthesizable. For loops are used for replicating logic not for sequencing through an array. As you arent treating the array a memory the tools will try and implement then as FFs, as each array requires 32K+ FFs the dessign is not likely to fit.