I think there's a more important reason than just an internal load for the clock greater than the data signal, as precised in the previous replies.
The hold time depends of the measure parameters. Usually, the hold time is specified from the 90% of the CLK rising edge to the 90% of a input falling edge (or 10% of a input rising edge). We assume the transition is taken at 50% of the edge.
So, draw a little picture of the signals and you will see that the hold time could be negative AND the effective transition (at 50%) of the clock is before the transition of the data, depending of the transition time (10%-90%) of the signals.
Best regards