The main reason of using DCT for compression is that it is asymptotically equivalent to the KLT for such large signal class like first order Gauss-Markov process with a large positive coefficient.
Filter bank implementation (for TDAC) is a second place of DCT application.
You can google by "asymptotically equivalent to the KLT"
You can see also
M. Vetterli and J. Kovacevi'c, Wavelets and Subband Coding, Prentice Hall,
Englewood Cliffs, NJ, 1995.
pp. 374-375 is short and clear explanation on this topic