Altera embedded multiplier performance

vidivici.world · Dec 13, 2010

Hi, all.
I implemented a 64-bit multiplier using Altera Quartus MegaWizard Plug-In Manager. The device is cyclone III series. But the classical timing analysis shows that it cannot even meet when tpd=20ns. I wonder why it can be so slow when the device is made with 65nm process, is there something wrong? Pls someone tell me, thank you!

The screenshots is below.

TrickyDicky · Dec 13, 2010

have you got plenty of registers surrounding the multiplier and are you using the multiplier's internal registers?

FvM · Dec 13, 2010

I wonder why it can be so slow when the device is made with 65nm process

The basic (18x18 ) bit multiplier block is said to achieve 340 MHz performance, I assume with fastest speed grade. You have a 64x64 multiplier, it obviously needs to combine multiplier blocks, also involving glue logic. So 43 MHz frequency without applying any pipelining to the design seems acceptable. 50 MHz should be easy by applying a single pipeline register level.

vidivici.world · Dec 13, 2010

Thanks.
Glue logic here to form 64*64 multiplier is just a few addings of Partial Products, will that degrade the performance so much?
Besides pipelining will increase clock cycles to two or more, that will limit the throughput.

---------- Post added at 15:39 ---------- Previous post was at 15:37 ----------

---------- Post added at 15:41 ---------- Previous post was at 15:39 ----------

TrickyDicky said:
have you got plenty of registers surrounding the multiplier and are you using the multiplier's internal registers?

No other logic. Just a 64-bit multiplier.
I didn't use any internal logic, just use MegaWizard Plug-In Manager creat a 64-bit multiplier module and compile it.

---------- Post added at 15:42 ---------- Previous post was at 15:41 ----------

FvM said:
The basic (18x18 ) bit multiplier block is said to achieve 340 MHz performance, I assume with fastest speed grade. You have a 64x64 multiplier, it obviously needs to combine multiplier blocks, also involving glue logic. So 43 MHz frequency without applying any pipelining to the design seems acceptable. 50 MHz should be easy by applying a single pipeline register level.

Thanks.
Glue logic here to form 64*64 multiplier is just a few addings of Partial Products, will that degrade the performance so much?
Besides pipelining will increase clock cycles to two or more, that will limit the throughput.[

---------- Post added at 15:48 ---------- Previous post was at 15:42 ----------

The 64-bit multiplier using ASIC UMC 0.18um process is about 4~5ns. And cyclone III uses 65-nm process, so it should be faster, but turns out no so good.

FvM · Dec 13, 2010

Glue logic here to form 64*64 multiplier is just a few addings of Partial Products, will that degrade the performance so much?

Depends on how many cascaded LUTs are involved. You may want to view the RTL netlist schematic, I see three adder levels in operation.

pipelining will increase clock cycles to two or more, that will limit the throughput.

Depends on the overall design topology. The basic idea of pipelining is to allow parallel operation, unfortunately it't not possible with any design.

TrickyDicky · Dec 13, 2010

vidivici.world said:
Thanks.
Glue logic here to form 64*64 multiplier is just a few addings of Partial Products, will that degrade the performance so much?
Besides pipelining will increase clock cycles to two or more, that will limit the throughput.

usually, pipelining increases the throughput because it allows you to increase the clock speed, often by quite a lot. Also, with more logic between registers, the routing delays increase, and these can become quite large with heavily populated chips.

The 64-bit multiplier using ASIC UMC 0.18um process is about 4~5ns. And cyclone III uses 65-nm process, so it should be faster, but turns out no so good.

An FPGA is not an ASIC, and is not really comparable to an ASIC.

vidivici.world · Dec 13, 2010

Thank you both

. Now I get it, pipeline would be a hopeful solution~~

vaisram · Dec 14, 2010

you could also declare it as a multicycle path in the timing constraints.. that would solve the problem too..

vidivici.world · Dec 14, 2010

vaisram said:
you could also declare it as a multicycle path in the timing constraints.. that would solve the problem too..

Multicycle path in timing constrains? If that didn't change the critical path, the problem will still be there.

Welcome to EDAboard.com

Altera embedded multiplier performance

vidivici.world

Junior Member level 3

TrickyDicky

Advanced Member level 7

FvM

Super Moderator

vidivici.world

Junior Member level 3

FvM

Super Moderator

vidivici.world

TrickyDicky

Advanced Member level 7

vidivici.world

vidivici.world

Junior Member level 3

vaisram

Member level 1

vidivici.world

Junior Member level 3

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics