Quartus / Prime floor planning / atoms to obtain best fMax

Wiljan · Sep 21, 2018

Hi

When using Quartus or Prime for a FPGA project and you compile "place and route" will place the different logic as it find "best" somehow, a bit depending on which setting has been set in settings, this will get a fair fMax and does work for many applications.

I do work on some project where I do have like 52 pipeline stages in 1 core and then a lot of the same cores in a FPGA

If I manually assign a LogiLock Region to a core in the ChipPlanner and size it so it will fit tight with the resources it will need and the compile again I do get better fMax.

If I then manually do the same but on each pipeline stage I get even better fMAx, since it places the logic close to each other it the order it needs to connect together. Like if you did a PCB layout you would not place all you logic chips random on the board.

So now for the question:
It get very complex to define those logicLock region manually for all pipelinstages in the ChipPlanner, so I was looking for a way to manually define the first one and then copy this to the rest of the pipelines in the core, and after this copy a whole core.

I did manage to a bit in a tcl script but I would expect there might be a more high level way to do this

Could you please lead me in the right direction

I can also see it possible to create atoms manually but I guess this is to low level.

Thx

TrickyDicky · Sep 21, 2018

My first question is - why? Do you have a target clock speed you're trying to acheive? or is this just an academic exercise to see what fmax you can actually acheive?
What clock speed requirement have you set? Often, if you over-constrain the clock, it makes the fitter work harder. But if its too far overconstrained, the fitter can give up early. So, do you have a target clock you're not achieving?
If you do acheive a higher fmax - how does this benefit you? can you actually increase the clock speed? why not just try with this higher clock speed in the first place. Again, if you underconstrain a design, it will stop when it achieves the desired goal.
So here, it is usually just best to set your desired FMax in the first place, and then put effort in if you fail to meet this timing. Surely you have bandwidth or clock requirements somewhere on the IO? so having a higher clock speed internally probably wont be much benefit, as the IO cannot take the extra data.

Using the logic locks should really only be a last resort to meet your timing requirements. If you lock everything down in the first instance, when you add some new logic functions, you're making it harder to route as you're potentially limiting the fitter's options for the new logic.

To your actual problem - the only way would be tcl scripts, and sensible path naming. With decent naming, you can simply lock everything in some loops. But this might be hard because it will be unlikely to get nice names/locations for the regions.
The chip planner does allow you to draw and move the regions in the gui (although I havent done this for a few years).

shaiko · Sep 21, 2018

Did you try and fail to meet timing using only SDC?

If yes, are you sure all clocks have been defined correctly?
Did you constraint all I/O?
Did you set false passes (where possible) to avoid overcostraining the design?

As Trickdicky noted - this should really really be your last resort after you've done absolutely everything you can using constraints and RTL optimizations.

Wiljan · Sep 21, 2018

Thank you for giving inspiration

Is for a proof on concept on an algorithm I do send in/out data over RS232 @115Kbaud I also have an RGB HV out to drive an LCD screen with some data
all this I/O are driven by clk_25

Then I have all the internal algorithm running with clk_200 all pipelined

I do use Cyclone V E A9
I have an ext_clk in of 50MHz to the PLL
output 0 at 200Mhz clk_200
output 1 at 25Mhz clk_25

So external I/O are not a bottleneck since it's all based on 25MHz

In the algo I do have a of 8x8 lookup tables i each round which I have placed in M10K as dual ROM (to save lut's) and do register in and out there is a restricted fMax on the M10K for this device on 275Mhz

The problem are that if I leaved the freq as 200MHz the fitter will end up in like 196Mhz and yes it does work on 200Mhz.... BUT I need a lot of cores and thereby more FPGA's and if possible to push fMax to a higher let say 275MHz I will have more results per time or less FPGA's

You might be right that I should force the in a higher desired frequency so the fitter do a better job.

but it's pretty clear when I try 300MHz and do place the M10K and the lut in very tight floor planning I can reach the fMax of 275MHz (M10K)

So if I was to do a PCB with TTL chips I would never let an auto router run all of it, I would place the chips strategic so wiring would most logic.

TrickyDicky · Sep 21, 2018

Usually, if there is a problem, the easiest place to fix it is in the design. Timing in and out of rams and DSP is usually the biggest issue, so the answer it to put extra reg stages at the input and output of rams and DSPs, to allow the router to minimise the routing into/out of the DSP. If you only have a single reg, it can often cause a battle in the fitter with the last reg being pulled between some logic and the DSP, so the extra reg can only prevent this conflict.

The problem with Logic Locks is that they are not portable, so if you need this design on another chip, it is much better to have well designed code than well placed logic. With good code, you wont have a problem in any chip.

300Mhz in a cyclone is always going to be adventurous.

shaiko · Sep 21, 2018

300Mhz in a cyclone is always going to be adventurous.

I remember an FAE at an Altera conference showcasing a Cyclone V design showcasing a Cyclone V design using 70% of the device running at 205 MHz. The point was how a low cost new Altera FPGA can handle high speeds.

300MHz isn't trivial on the latest Stratix 10 - I wouldn't expect it to work on a large design for the Cyclone V...

Wiljan · Sep 22, 2018

I will try to see if I add even more registers between the M10K stages
Also I will try to press the constrain a bit more.
And see if I can do some TCL script to make floor-planing a bit automatic.

Thank you for feedback.

Welcome to EDAboard.com

Quartus / Prime floor planning / atoms to obtain best fMax

Wiljan

Junior Member level 3

TrickyDicky

Advanced Member level 7

shaiko

Advanced Member level 5

Wiljan

Junior Member level 3

TrickyDicky

Advanced Member level 7

shaiko

Advanced Member level 5

Wiljan

Junior Member level 3

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics