It's because you implemented your 4-bit adder as a structural implementation...why did you do that instead of using a behavior RTL description? i.e. sum <= a+b;
This is a good example of why you never ever ever use structural design implementations unless you like having to do a lot more work for a minor change like + -> * and sum[3:0] -> prod[7:0]. But hey there is one guy on this forum that thinks using structural gate level code is more "efficient", so you're not alone.
I'm not even sure how you would create a 1-bit multiplier that you can instantiate multiple times to build up a 4-bit multiplier. I would have to look at all the equations for each individual bit and derive a generic 1-bit implementation.