[SOLVED] FPGA design - optimized implementation method

UltraGreen · Sep 15, 2016

Hello All,

I was looking for an optimized way to synth and implement multiple runs in parallel if possible ?
The design is huge with 90% utilization, so it takes 5 hours to synthesize and 11 hours to implement.

What are the ways to reduce these run times ( I cannot modify the rtl codes, I know there are combinatorial loops )
Also How can I efficiently utilize the time while its getting implementing. Anything which I can do in parallel to get results of multiple optimization techniques symultaniously or at lease asap.

Thanks

KlausST · Sep 15, 2016

Hi,

without knowing anything about your software and your hardware it is hard to help.

* use a faster processor
* use an operting system that uses the features of the processor. (hyperthreading, multiple cores)
* use more SRAM at your PC
* use an SSD instead of a HDD
* run the temporary files locally rather than on the network

Klaus

dpaul · Sep 15, 2016

What are the ways to reduce these run times ( I cannot modify the rtl codes, I know there are combinatorial loops )

Already answered in #1

Also How can I efficiently utilize the time while its getting implementing. Anything which I can do in parallel to get results of multiple optimization techniques symultaniously or at lease asap.

Not on the same machine as it will definitely slow down the synth/impl process.
1> Normally engineers write a script and then run it before signing-off from work. In the morning you can come back and check your log file(pass/fail results).
2> Ask your IT deptt to give you remote access so that you can monitor the runs from home.
3> Do some documentation work (it is always useful but is a time taking work).

UltraGreen · Sep 15, 2016

My apologies for incomplete information.
I am already using 16 gb ram and i7 processor machine, the Hard drive is not ssd but changing that is not an option.
The device is ultrascale virtex , biggest in the industry.
I trying to partition the device and make modules out of context. will that help reducing run times ?, if so then by how much ?

@KlausST can you please elaborate the

run the temporary files locally rather than on the network

KlausST · Sep 15, 2016

Hi,

still no information about your software.

So we have to guess.
Maybe your source files and your working diriectory is on a remote file server.
If so, then move your complete project and your working directory to your local harddisk. This may cause the process to run much faster.

Klaus

dpaul · Sep 15, 2016

I am already using 16 gb ram and i7 processor machine, the Hard drive is not ssd but changing that is not an option.

This is good enough for most FPGA oriented tasks.

In my previous company, when I was working with ASICs, there were dedicated high-performance Solaris/Linux m/c for running high resource consuming tasks. We used to submit jobs to those machines and continue with other work on the normal log-in machines.

I now work with FPGA, in a smaller company, and have a work PC with similar configuration as yours. Now I don't have the resource luxuries as was in my previous company. Hence for high resource consuming task I do what is mentioned in #3.

I don't know what other synth. s/w you have at your disposal. Synopsys Synplify Pro for FPGA might work faster than Vivado synth engine (nor sure, correct me if I am wrong). But for PnR/Impl you have to use the Vivado engine.

TrickyDicky · Sep 15, 2016

Which device are you using - This table says that the larger devices can typically use 16GB of ram with a peak usage up to 48GB for the largest device! This will be per compilation, so multiple compiles on the same machine will require N times this ram. Our multiple job machines are 12 core Xeons with 128GB of ram (running Centos 6).

5 hour synthesis seems extreme. We have a full Stratix 4 that takes 30 mins to synth and then 3 hours to fit ( know its not an ultrascale, but still a long time).

FvM · Sep 15, 2016

Information giving so far suggests that you are running Vivado on your local computer. So we can expect that your working directory is also local, otherwise there could be an additional performance problem.

I'm not familiar with Xilinx tools, presume they are using multiple processor cores to some extend. It's surely documented, the CPU utilization should be also listed in report files. To decide if multiple parallel compilation on multiple tool instances can potentially speed up anything, you have the information at your finger tips. I'm however not sure if your tool license allows it.

Nothing has been yet said about the design nature, I guess combinational loops are not the dominant problem.

TrickyDicky · Sep 15, 2016

FvM said:
Information giving so far suggests that you are running Vivado on your local computer. So we can expect that your working directory is also local, otherwise there could be an additional performance problem.

I'm not familiar with Xilinx tools, presume they are using multiple processor cores to some extend. It's surely documented, the CPU utilization should be also listed in report files. To decide if multiple parallel compilation on multiple tool instances can potentially speed up anything, you have the information at your finger tips. I'm however not sure if your tool license allows it.

Nothing has been yet said about the design nature, I guess combinational loops are not the dominant problem.

One thing I do know that can balloon synth times is creating large memories from logic, often accidently. This could also account for the large logic utilisation.

UltraGreen: Are you inferring rams? do they follow the correct templates and infer the rams correctly?

vGoodtimes · Sep 16, 2016

UltraGreen said:
I am already using 16 gb ram and i7 processor machine, the Hard drive is not ssd but changing that is not an option.
The device is ultrascale virtex , biggest in the industry.

How did you get to this point without a build server or build farm? Something with 128GB+ ram and 16-32 cores? Or multiple servers like this or better.

From there, you can use Vivado's built-in remote build tools to queue up as many run strategies as you want. I would routinely do this for V7 designs.

UltraGreen · Sep 16, 2016

TrickyDicky said:
One thing I do know that can balloon synth times is creating large memories from logic, often accidentally. This could also account for the large logic utilization.

UltraGreen: Are you inferring rams? do they follow the correct templates and infer the rams correctly?

Yes Tricky, I am inferring ram, not using resets, so that bram gets instantiated.
I am using vivado 15.4 Now I moved my design to the server, centos7 48 gb ram an xeon processor. Synthesis is still taking 3 hours. Implementation is still going on and its been 5 hours.
The routing congestion is very high, so i can understand the extra time in implementation. Actually I just wanted to be sure that weather my run time are normal or is there something which needs to be fixed.

TrickyDicky · Sep 16, 2016

UltraGreen said:
Yes Tricky, I am inferring ram, not using resets, so that bram gets instantiated.

Do they actually get instantiated as BRAMs though, and not get inferred as logic? have you checked?
Accidently inferring logic based ram because you didnt follow the correct pattern to infer a BRAM leads to massive synthesis times.

Welcome to EDAboard.com

[SOLVED] FPGA design - optimized implementation method

UltraGreen

Junior Member level 3

KlausST

Advanced Member level 7

dpaul

Advanced Member level 5

UltraGreen

Junior Member level 3

KlausST

Advanced Member level 7

UltraGreen

dpaul

Advanced Member level 5

UltraGreen

TrickyDicky

Advanced Member level 7

FvM

Super Moderator

TrickyDicky

Advanced Member level 7

UltraGreen

vGoodtimes

Advanced Member level 4

UltraGreen

UltraGreen

Junior Member level 3

TrickyDicky

Advanced Member level 7

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics