Continue to Site

# Pipeline - Multiplication problem

Status
Not open for further replies.

#### HyperText

##### Junior Member level 2
Hi

I have to realize a pipeline to execute this expression: (A+B)^2 - C[D - (E/2)]
A, B, C, D and E are all signed 16 bit.

However I can divide that operation into three (or four) stages:
1°: x=A+B and y=D-(E/2) (the division delay can be negligible because it's a simple right shift)
2°: t=x^2 and w=y*C
3°: t-w

And I have to synthesize it on my Digilent Nexys 2 1200k (Spartan-3E FPGA).

For the addition/subtraction operations I use a Carry Look Ahead AddSub (or a simple Ripple Carry Adder) that I designed and it can run at over 120MHz.
The problem occurs when I have to do the multiplication operation (second stage). I realised a MAC Multiplier, but for 16 bit it's VERY SLOW (under 20MHz), or a Robertson/Booth Multiplier, but it's a FSM. I can't use the multiplier inside my FPGA (I know that it's VERY fast, but it would be too easy and I simply CAN'T use it :<).

So, I would do some questions:
- am I right saying that a FSM at the second stage would be meaningless? (Booth algorithm uses more than 10 stages to produce the result)
- are there other MORE efficient solutions for the multiplication (and for my FPGA)?

Thanks and Happy new year!

I don't see a problem to implement a pipeline over 10 stages if you prefer the respective multiplier solution. In so far I don't understand why you think it's meaningless. It's all a matter of intended clock speed and word width (the latter hasn't been mentioned yet).

If a sequential multiplier ("FSM") is appropriate at all depends on the intended throughput. The same structure can be implemented in a pipelined manner of course.

I have another question: if I put my sequential multiplier into the second stage, is this correct?
Or I should implement my pipeline INSIDE the FSM of the sequential multiplier?

I don't know if I rightly explained myself ... sorry for the English ^^

Keep in mind that this is an artificial problem. Anyone else would choose the built in multipliers and the inferred adders. These would both be smaller, faster, and lower power than almost every other elaborate solution. (eg, small enough multipliers could be implemented in BRAM/LUTs).

(for fun, compare your RCA/CLA against just x+y. CLA quickly loses out because it can't use specialized logic or dedicated routing)

Status
Not open for further replies.