IMO, the best example is a FIR filter.
This is a basic operation that is easy to understand. FPGA designs exist where you can accept 1 sample per cycle, multiple samples per cycle, or 1 sample per multiple cycles. The difference being the amount of resources used to construct a circuit that can perform the task.
For useful design plans, pipelining, b-processing, and channelizing are all useful concepts. For overall design, consider that each module can be evaluated by itself only if the interface and properties of that interface are well defined. For this reason, unique interfaces between modules should be limited to cases where they are absolutely needed for performance reasons.