This is a tricky one! Verilog scheduling semantics basically imply a four-level deep queue for the current simulation time:
1: Active Events (blocking statements)
2: Inactive Events (#0 delays, etc)
3: Non-Blocking Assign Updates (non-blocking statements)
4: Monitor Events ($display, $monitor, etc).
Since the "a = 0" is an active event, it is scheduled into the 1st "queue". The "a <= 1" is a non-blocking event, so it's placed into the 3rd queue. Finally, the display statement is placed into the 4th queue.
Only events in the active queue are completed this simulation cycle, so the "a = 0" happens, and then the display shows a = 0. If we were to look at the value of a in the next simulation cycle, it would show 1.