Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

reduce the ISE implementation time

Status
Not open for further replies.

shield

Junior Member level 3
Joined
Sep 9, 2005
Messages
30
Helped
0
Reputation
0
Reaction score
0
Trophy points
1,286
Activity points
1,446
Hi, I encounter an problem: the ise implementation flow spends 5 hours or so.
This lead to the effiency is so low for asic prototyping in our project.
Who has the experience on this problem? Maybe you can give me some advive or
provide some materials. thank you!
 

use strong computer, and lot of memory.
use linux not windows.
 

You either have a very large design or tight place and route constraints if not both. The larger your design and/or the tighter your place and route and timing budget the longer its going to take for the tool to work.

E
 

Several years ago, I had a similar problem with a big Virtex-II design. ISE dumped all the logic into the middle of the chip and tried to route the scrambled mess. The placer and router struggled for two or three hours trying to achieve timing closure. It usually failed.

I greatly improved the situation by applying LOC area constraints to each of my HDL modules. I also arranged the areas so the interconnecting signals were mostly short. A few signals still needed to jump across the chip, so I inserted a pipeline register into them to split the long propagation delay. ISE built the project in about 45 minutes.

Place and route is memory intensive. Be sure your computer has fast RAM and sufficient RAM to avoid swapping to disk.
 

echo47 said:
I greatly improved the situation by applying LOC area constraints to each of my HDL modules. I also arranged the areas so the interconnecting signals were mostly short.

Does this mean,that the pin-outs of the FPGA shouldbe much closer to the vicinity of the o\p registers/buffers?...I'm not able to understand!. Suppose the pin-out which we have already routed in a PCB for one application is far opposite to that of the nearest pin for one architecture,then how can we solve this case?...

echo47 said:
A few signals still needed to jump across the chip, so I inserted a pipeline register into them to split the long propagation delay.
Please simply explain what is pipleline register and how to insert it?..I couldn't find any relevant doc in google. Thanks [/quote]
 

Here's an example that may help to clarify my description.

Let's say the FPGA does some sort of data processing using four HDL modules named A, B, C, and D. Data flows from the input pins through module A, then through module B, then through C, then through D, and finally out the output pins. If I don't apply any area constraints, ISE will dump all four modules into the middle of the chip and try to route them. Sometimes that works fine, but other times the congestion is too severe. To avoid the snarl, I constrain module A into the upper-left corner of the chip, module B into the upper-right corner, module C into the lower-right corner, and module D into the lower-left corner. I also try to place the input pins near module A, and the output pins near module D, to avoid long routes across the chip to the I/O pads. Now, when ISE routes the chip, there's much less routing congestion, and cleaner shorter routes between modules.

A pipeline register is an ordinary D-flop placed in the middle of combinatorial logic, or in the middle of a long route. It basically divides the propagation delay in half. The shorter delays allow you to increase the clock frequency. However, the register introduces an additional clock cycle of latency, so you must modify your system timing to accommodate it.

It takes several nanoseconds for a route to cross the entire width of a large FPGA. That may be too slow for the desired clock rate. By putting a pipeline register into the middle of the route, the delay is cut in half.

This is somewhat advanced stuff. It helps to be experienced with FPGA design and the software tools.

There may be other situations that cause long place & route times, but the one I've described here is the problem I see most often.
 

i know people that run ise for more then a day - 5 hours is great !
 

echo47 said:
Here's an example that may help to clarify my description.

Let's say the FPGA does some sort of data processing using four HDL modules named A, B, C, and D. Data flows from the input pins through module A, then through module B, then through C, then through D, and finally out the output pins. If I don't apply any area constraints, ISE will dump all four modules into the middle of the chip and try to route them. Sometimes that works fine, but other times the congestion is too severe. To avoid the snarl, I constrain module A into the upper-left corner of the chip, module B into the upper-right corner, module C into the lower-right corner, and module D into the lower-left corner. I also try to place the input pins near module A, and the output pins near module D, to avoid long routes across the chip to the I/O pads. Now, when ISE routes the chip, there's much less routing congestion, and cleaner shorter routes between modules.
---Excellent "echo47"....You really deserve all credits...Anyway I'm not upto that level of excellency in FPGA design yet...I have not been through Manual routing yet...beginners like us need such helping hands from experts like you...continue your service..And one more

A pipeline register is an ordinary D-flop placed in the middle of combinatorial logic, or in the middle of a long route. It basically divides the propagation delay in half.
--Here What seems to be so advantageous in adding a D-flipflop?...For eg,a delay is a delay which can't be cut-down by anymeans once it's occured....So if we insert an D-FF,then the overhead is (Total delay = Prop delay+D-FF Delay)...How can it help you cut the delay time to half?...It just blows the issue big,isn't it.So I think the designer must be more specific about using it at the time of designing.we have to consider the additional delay of D-FF introduced while connecting interface A and C(in your eg) and design our interface,right?...Thanks

Added after 4 minutes:

To me,Synthesize time takes a bit long...I use more arrays...It was before I started using BRAM. After using BRAM,I could see some improvement in synthesizing, but however the time taken is more than normal for synthesize!..What does that mean?...My code has more If-Else statements used for 25 flags...Does it mean code inefficiency?.....
 

Remember that the goal of synchronous logic timing is to reduce all the flop-to-flop path delays until they are shorter than the clock period. A typical modern fast FPGA has sub-nanosecond gate and flop speed, but a route traveling across the chip can easily take five nanoseconds. That 5ns route would limit your clock rate to below 200 MHz, even if the flops and gates were infinitely fast. By inserting a D-flop into the middle of the route, the flop-to-flop delay becomes 2.5ns (plus a small flop delay), and 200 MHz becomes easy. Shorter routes allow even higher clock rates. Of course, don't put too much combinatorial logic between the flops, because that also limits your clock rate. When I need a really fast clock rate, I put only one level of combinatorial logic between flops. That seems to make synthesis run smoother and faster too.

One disadvantage of inserting an extra flop is that you must now account for the extra clock cycle of latency in your system timing design. That's what pipelined design is all about, and sometimes it's not easy.

If your long ISE run time occurs only during HDL compiling, then my area constraint and pipelining suggestions won't help you. My projects usually compile pretty fast, and then take a long time during place and route.

I haven't noticed any particular HDL syntax that significantly helps or hurts compile time (except for some weird XST issue involving loops in a Verilog 'initial' statement). The important goal is to design efficient hardware, and then describe it accurately with HDL. It usually doesn't matter which HDL syntax you use, as long as it doesn't distort your original efficient design. If you are using Xilinx XST, you can see many small examples in the "XST HDL Coding Techniques" chapter of the XST User Guide.

A large distributed RAM takes a long time to synthesize and route because it is implement as a large number of interconnected LUTs and flops. A Block RAM is one simple object that synthesizes and routes much faster.
 

No, but an FAE badge could be handy. FAEs probably have access to special development tools, source code, product pre-release info, and better support people. Old low-volume customers like me get WebCase.
 

Thanks "echo47"...Also I happen to realize that Adding Block Ram to our model not always helps in reducing the gate level(Distr RAM) resources...I learnt it from experience, that to some extend using of BRAM helps up the case, however not always,perhaps it only helps cutting down much of the synthesis and implementation time...Morever Writing\Reading BRAM takes more than one clk(Singl port for eg) than doing the same over Distributed RAM where action is instantaneous. What kinda advantage you think BRAM is gonna give us when timming is so critical\tight?...
 

It sounds like something went wrong during your Block RAM implementation. Check your synthesis report to see whether Block RAM or Distributed RAM was actually synthesized.

A Block RAM can run from about 200 MHz to 500 MHz, depending on your FPGA type. It requires only one clock cycle, unless you enable the optional output register that's provided on some FPGAs. It shouldn't consume any FPGA resources except for the Block RAM itself.

One reliable method of putting a Block RAM into your design is to instantiate a RAMB16_* library primitive into your HDL. However, I prefer to use an HDL register array that infers the Block RAM, but that requires careful coding because it's easy to write the HDL a little bit wrong, causing XST to infer Distributed RAM instead of Block RAM. The XST User Guide describes appropriate syntax in the chapter "HDL Coding Techniques". Or try using an HDL code template provided in Project Navigator (I haven't tried them). Also, newer versions of XST are smarter about inferring Block RAM than older versions.

If your design requires numerous very small RAMs (such as 16 or 32 words), then Distributed RAMs may be a better choice than Block RAMs.
 

Hello friend,I don't know how you guys may describe BRAM,but how I learnt is from CORE IP generator in xilinx,i use to configure the RAM as single or dual port as well as the READ-ONLY OR READ-WRITE mode,data width and depth etc...It generates me a Structural vhdl code which I will add as a component of my main module. So inorder to write data to it,I need to first select wr\rd = '0' and then in the next clock I will place data and addr. If I wanna read,again I need to select wr\rd = '1' and then place addr. Only in the next rising clk,the data will be out from BRAM. So it considerably takes time....I just meant that...I hope you understand my case.
 

Yes, Core Generator is another method.

Block RAM is synchronous, so it requires a clock pulse to load your address and read/write request, but that clock can be very fast. If you need asynchronous RAM with no read clock, then you must use Distributed RAM. However, a large Distributed RAM has significant propagation delay, usually resulting in slower overall performance than Block RAM. Small Distributed RAMs are reasonably fast, though. It's a system design trade-off decision.
 

One disadvantage of inserting an extra flop is that you must now account for the extra clock cycle of latency in your system timing design. That's what pipelined design is all about, and sometimes it's not easy.

If your long ISE run time occurs only during HDL compiling, then my area constraint and pipelining suggestions won't help you. My projects usually compile pretty fast, and then take a long time during place and route.
 

Quite the oppsoite infaact for me!....My synthesize time takes much longer than one could patiently wait....But my routing and placing time speeds up like a turbo...Don't understand if it's due to the way of programs and looping which may cause much longer time to infer the resources required....
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top