I haven't fully absorbed the nuances of Verilog simulator timing. It follows precise rules (IEEE Standard 1364-2001). Some of those rules seem a bit strange to me, but probably exist for a good reason.
A blocking assignment with a delay (#1.0 out = in
freezes the "always" block execution until the delay finishes, and then copies "in" to "out". During that delay, no other statements in the block can execute, even if other signals in your sensitivity list are wiggling. I don't know why many folks encourage using blocking assignments. Maybe because it makes Verilog look more like a software programming language. Seems unwise to me.
A non-blocking assignment with a future event (out <= #1.0 in
immediately reads "in" and then schedules "out" to change at a future time. Simulation continues without hesitation, and without distrubing the other "always" block statements.
When you compiled your original post-route simulation netlist, the compiler generated code that approximates the actual behavior of the hardware gates and routes. That code contains delays and probably non-blocking assignments.
I understand your confusion. When you wrote your original code, you were predicting how the hardware would behave (propagation delays). In fact, you were depending on it. However, your pre-route simulation code didn't describe those hardware characteristics, so the simulator simply executed the code in strict Verilog fashion, and gave you unexpected results. Perhaps you could grab a copy of the IEEE standard and read section 5 "Scheduling Semantics".
You may want to try writing a tiny test module that feeds two or three very short overlapping pulses into an combinatorial "always" block, and then watch what happens when you insert delays and blocking or non-blocking statements. Beware that your simulator may or may not be 100% compliant with the Verilog standard.