My suggestion is looking at the transistor level circuit of a flop or latch.
The input data is captured into a keeper in a flop/latch, where the keeper is made with a forward and a feedback inverter. When the input data flips, the cell driving the flop input and the feedback inverter in the keeper will contend and create a DC path. Since the feedback inverter has a higher resistance, the value in the keeper will settle to the new value quickly, but it still takes time. Setup and hold time are determined by how quickly the data in the keeper can settle.