Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

[SOLVED] FPGA design - optimized implementation method

Status
Not open for further replies.

UltraGreen

Junior Member level 3
Joined
Aug 26, 2016
Messages
30
Helped
0
Reputation
0
Reaction score
0
Trophy points
6
Activity points
269
Hello All,

I was looking for an optimized way to synth and implement multiple runs in parallel if possible ?
The design is huge with 90% utilization, so it takes 5 hours to synthesize and 11 hours to implement.

What are the ways to reduce these run times ( I cannot modify the rtl codes, I know there are combinatorial loops )
Also How can I efficiently utilize the time while its getting implementing. Anything which I can do in parallel to get results of multiple optimization techniques symultaniously or at lease asap.


Thanks
 

Hi,

without knowing anything about your software and your hardware it is hard to help.

* use a faster processor
* use an operting system that uses the features of the processor. (hyperthreading, multiple cores)
* use more SRAM at your PC
* use an SSD instead of a HDD
* run the temporary files locally rather than on the network

Klaus
 

What are the ways to reduce these run times ( I cannot modify the rtl codes, I know there are combinatorial loops )
Already answered in #1

Also How can I efficiently utilize the time while its getting implementing. Anything which I can do in parallel to get results of multiple optimization techniques symultaniously or at lease asap.
Not on the same machine as it will definitely slow down the synth/impl process.
1> Normally engineers write a script and then run it before signing-off from work. In the morning you can come back and check your log file(pass/fail results).
2> Ask your IT deptt to give you remote access so that you can monitor the runs from home.
3> Do some documentation work (it is always useful but is a time taking work). :)
 

My apologies for incomplete information.
I am already using 16 gb ram and i7 processor machine, the Hard drive is not ssd but changing that is not an option.
The device is ultrascale virtex , biggest in the industry.
I trying to partition the device and make modules out of context. will that help reducing run times ?, if so then by how much ?

@KlausST can you please elaborate the
run the temporary files locally rather than on the network
 
Last edited by a moderator:

Hi,

still no information about your software.

So we have to guess.
Maybe your source files and your working diriectory is on a remote file server.
If so, then move your complete project and your working directory to your local harddisk. This may cause the process to run much faster.

Klaus
 
I am already using 16 gb ram and i7 processor machine, the Hard drive is not ssd but changing that is not an option.
This is good enough for most FPGA oriented tasks.

In my previous company, when I was working with ASICs, there were dedicated high-performance Solaris/Linux m/c for running high resource consuming tasks. We used to submit jobs to those machines and continue with other work on the normal log-in machines.

I now work with FPGA, in a smaller company, and have a work PC with similar configuration as yours. Now I don't have the resource luxuries as was in my previous company. Hence for high resource consuming task I do what is mentioned in #3.


I don't know what other synth. s/w you have at your disposal. Synopsys Synplify Pro for FPGA might work faster than Vivado synth engine (nor sure, correct me if I am wrong). But for PnR/Impl you have to use the Vivado engine.
 
Which device are you using - This table says that the larger devices can typically use 16GB of ram with a peak usage up to 48GB for the largest device! This will be per compilation, so multiple compiles on the same machine will require N times this ram. Our multiple job machines are 12 core Xeons with 128GB of ram (running Centos 6).

5 hour synthesis seems extreme. We have a full Stratix 4 that takes 30 mins to synth and then 3 hours to fit ( know its not an ultrascale, but still a long time).
 

Information giving so far suggests that you are running Vivado on your local computer. So we can expect that your working directory is also local, otherwise there could be an additional performance problem.

I'm not familiar with Xilinx tools, presume they are using multiple processor cores to some extend. It's surely documented, the CPU utilization should be also listed in report files. To decide if multiple parallel compilation on multiple tool instances can potentially speed up anything, you have the information at your finger tips. I'm however not sure if your tool license allows it.

Nothing has been yet said about the design nature, I guess combinational loops are not the dominant problem.
 

Information giving so far suggests that you are running Vivado on your local computer. So we can expect that your working directory is also local, otherwise there could be an additional performance problem.

I'm not familiar with Xilinx tools, presume they are using multiple processor cores to some extend. It's surely documented, the CPU utilization should be also listed in report files. To decide if multiple parallel compilation on multiple tool instances can potentially speed up anything, you have the information at your finger tips. I'm however not sure if your tool license allows it.

Nothing has been yet said about the design nature, I guess combinational loops are not the dominant problem.

One thing I do know that can balloon synth times is creating large memories from logic, often accidently. This could also account for the large logic utilisation.

UltraGreen: Are you inferring rams? do they follow the correct templates and infer the rams correctly?
 
I am already using 16 gb ram and i7 processor machine, the Hard drive is not ssd but changing that is not an option.
The device is ultrascale virtex , biggest in the industry.

How did you get to this point without a build server or build farm? Something with 128GB+ ram and 16-32 cores? Or multiple servers like this or better.

From there, you can use Vivado's built-in remote build tools to queue up as many run strategies as you want. I would routinely do this for V7 designs.
 
One thing I do know that can balloon synth times is creating large memories from logic, often accidentally. This could also account for the large logic utilization.

UltraGreen: Are you inferring rams? do they follow the correct templates and infer the rams correctly?

Yes Tricky, I am inferring ram, not using resets, so that bram gets instantiated.
I am using vivado 15.4 Now I moved my design to the server, centos7 48 gb ram an xeon processor. Synthesis is still taking 3 hours. Implementation is still going on and its been 5 hours.
The routing congestion is very high, so i can understand the extra time in implementation. Actually I just wanted to be sure that weather my run time are normal or is there something which needs to be fixed.
 

Yes Tricky, I am inferring ram, not using resets, so that bram gets instantiated.

Do they actually get instantiated as BRAMs though, and not get inferred as logic? have you checked?
Accidently inferring logic based ram because you didnt follow the correct pattern to infer a BRAM leads to massive synthesis times.
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top