I can give a quick qualitative explanation. I can probably add a quantitative explanation if there is a request and have time. Qualitatively, you can think of an energy barrier between the Source and Drain regions when the gate is turned off which prevents carriers from flowing across the channel. Applying a drain potential, reduces the energy level at the drain, but the barrier still exists because the gate is turned off. When you turn on VG, the channel barrier reduces and carriers flow, giving you current.
As the technology node is scaled, the distance between the source and drain is reduced and now the drain influence on the channel through the depletion region between drain/channel when VDS>0 is applied even when the gate is turned off. This reduces the barrier even though VG=0, which allows some of the carriers from the source to travel up to the drain. This is your leakage current. This current increases with node scaling, because Lg scaling brings S and D closer to each other and gate does not have sufficient control over the channel. This is one of the reasons why you see worse sub threshold slope and DIBL (Drain induced barrier lowering) at scaled nodes. And this it the primary reason why the industry is moving towards FinFETs where the gate is all around the channel on three different sides (two sides + top, Intel calls this Trigate but essentially its a FinFET first invented in the 90s at UC Berkeley), where the gate now has more control over the channel, thus allowing gate length scaling and preventing worse leakage.
Hope this helps, its better illustrated with band diagrams or energy level charts. Also, any device physics textbook will give you the equations to support this. I explained Source to Drain leakage, while there are other leakage components, such as leakage through body, gate leakage (mitigated by high-K, as explained in one of the posts earlier), GIDL/band-to-band tunneling etc.