A practical example of these components comes up in trying to get gigabit ethernet working.
Such a system has the following requirements:
1.) each transmitter has a locally generated clock used for transmission.
2.) data is transmitted as a full packet -- no pausing in between valid data.
3.) actual transmission is "media dependent" (eg fiber vs copper) so you use a standard "media independent interface" to a PHY that connects to the actual port.
A typical system might have a 25MHz clock input. (dev boards might have a 50MHz or 200MHz clock input). GMII -- the standard "gigabit media independent interface" has 8b of data @ 125MHz. 125MHz isn't 25MHz, 50MHz, or 200MHz -- the first problem. The DCM has some limited ability to generate clocks from other clocks though, so it makes sense to generate a 125MHz clock from the system clock.
But 125MHz might not really be the clock rate you want to perform calculations at. You might generate multiple clock frequencies. If you choose 250MHz, it becomes easier to transfer data between the two clock domains -- if done correctly the rising edges of the 250MHz clock will align closely to the edges of the 125MHz clock.
This lets you process and send data, but what of receiving data? The locally generated clocks for ethernet come from imperfect sources. They might be 125,003,125 Hz on your device and 124,998,432 Hz on the device you connect to! So now you have a third clock in your design. Not just that, but you need to use this new external clock to get data into your system -- you actually care about things like difference in delay/phase between the arriving clock and the arriving data. But you also only use the clock for getting data into the system. Now you have choices -- you can use the BUFIO/BUFR to to clock the IO and a limited amount of extra logic. You could also use a BUFG. The BUFG has a lot of delay, so you might use a DCM to generate a phase shifted clock (using feedback) to remove the BUFG's delay.
Next you need data to cross between these "plesiosynchronous" clock domains -- the frequencies being almost equal but not quite. The built in fifo's are great for this as they include logic you can use.
Lastly, when you do send data, you need to do so as a burst. It is easy to use a BRAM to buffer up a full packet to ensure you can send it without interruptions.
This shows a practical design -- multiple clocks for different reasons, buffering of data, and getting data between clock domains.