I think in the pipeline form after all the registers are full (which may takes some clk cycles) then there is an output in each clk am i right?
my question is : are inputs fed into multipliers simultaneously or with a delay and one by one as a stream by using BCIN-BCOUT in DSP48 block(in Xilinx virtex-4)
and if so(one by one) then how does using for example 64 multipliers in a 64-tap FIR make a better performance? by using pipeline or what?
I want to know how using a huge number of MUL blocks can increase the performance ( if inputs are fed one by one with delay i see no difference with a single MACC based form)
thx a million