Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Help, How to solve the timing issue?

Status
Not open for further replies.

cherjier

Member level 5
Joined
Dec 6, 2006
Messages
84
Helped
6
Reputation
12
Reaction score
1
Trophy points
1,288
Activity points
1,909
Hi,

is it difficult to made the FPGA run at 200Mhz? i have difficulties get the FPGA to speed up.

below is the report:
Slack: -7.451ns (requirement - (data path - clock path skew + uncertainty))
Source: core/lcd/lcd_read/datcnt[2] (FF)
Destination: core/lcd/lcd_fifo/shft_buff2[623] (FF)
Requirement: 3.906ns
Data Path Delay: 11.240ns (Levels of Logic = 4)
Clock Path Skew: 0.000ns
Source Clock: lclk rising at 0.000ns
Destination Clock: lclk rising at 3.906ns
Clock Uncertainty: 0.117ns

Data Path: core/lcd/lcd_read/datcnt[2] to core/lcd/lcd_fifo/shft_buff2[623]

Location | Delay type | Delay(ns) Physical Resource | Logical Resource(s) |
------------------------------------------------- -------------------
SLICE_X104Y283.YQ | Tcko | 0.360 | core/lcd/lcd_read/datcnt[5]
core/lcd/lcd_read/datcnt[2] |

SLICE_X104Y282.G1 | net (fanout=4) | 0.573 | core/lcd/lcd_read/datcnt[2] |

SLICE_X104Y282.Y | Tilo | 0.195 | TP40_c
core/lcd/lcd_read/un7_enab_shft_bufflto3

SLICE_X104Y282.F4 | net (fanout=3) | 0.164 | core/lcd/N_341

SLICE_X104Y282.X | Tilo | 0.195 | TP40_c
core/lcd/lcd_read/enab_shft_buff

SLICE_X101Y183.G4 | net (fanout=1025) | 3.489 | core/lcd/enab_shft_buff

SLICE_X101Y183.Y | Tilo | 0.194 | core/lcd/lcd_fifo/shft_buff2[1081]
core/lcd/lcd_fifo/svbl_244.shft_buff2_5_sn_m1

SLICE_X123Y259.F1 | net (fanout=1088) | 5.835 | core/lcd/lcd_fifo/shft_buff2_5_sn_N_2

SLICE_X123Y259.CLK | Tfck | 0.235 | core/lcd/lcd_fifo/shft_buff2[623]
core/lcd/lcd_fifo/svbl_244.shft_buff2_5_0_1[623]
core/lcd/lcd_fifo/shft_buff2[623]
------------------------------------------------- ---------------------------
Total 11.240ns (1.179ns logic, 10.061ns route)
(10.5% logic, 89.5% route)

the the report i can see that the main delay is from the routing resources. can anyone suggest me a way how to improve the timing? do i have to do it using FPGA editor? :cry:
 

Four levels of logic at 200 MHz could be easy, or it could be difficult. What type of FPGA is this?

fanout=1025, fanout=1088 ---- Wow, why so many? Can you break up those nets?

In general, the most effective way to speed up a design is to add pipelining.

Typical LCD devices don't need 200 MHz.
 

i'm using vertex4.

the high fanout is due to register file for a fifo -> reg [1023:0] shft_buff;

the "enab_shft_buff" net is the control signal to shift the "shft_buff2"
i think this is the reason that it had a high fanout,am i correct?

for typically, the fanout should not exceed what value? few hundred is consider ok?

besides,the name LCD is not meant for LCD display just use the same short form. :)

reason the FPGA need to run at 200Mhz is because the ddr mem controller need to run at 133MHz.

can u show an simple example on how to add pipelining.. thank you
 

result <= a+b+c+d


pipelined

if(clk = '1' and clk'event) then
r1<= a+b;
r2 <= c+d;
result <= r1+r2
end if;
end process;
 

A Virtex-4 is pretty fast, but a high-fanout signal can slow down a route and can cause timing trouble. The faster the clock, the less fanout you want. Your design may have combinatorial logic feeding the high-fanout signal. If that's true, try modifying your design so the high-fanout signal comes from a clocked register instead of from combinatorial logic. That will probably require adding other registers and redesigning your timing diagram, but that's what pipelining is all about, as Iouri demonstrated. A register driving a high-fanout signal has a better chance of meeting timing constraints. A register can also be easily replicated (hopefully automatically by the place-and-route software) to reduce the fanout load on each segment.

Sometimes I create small test projects that check the performance of a small chunk of logic, without having to wait for the software to compile my big project. If that sounds helpful, try creating a small test project that checks only the fanout problem. When you find a good solution, you can integrate it into your big project.
 

first i need to thanks Iouri and echo47.

yes..indeed...breaking the Fanout did help improve the timing...thank you very much.

now the report give me this:
Slack: -1.366ns (requirement - (data path - clock path skew + uncertainty))
Source: core/pln_b/pln_shift/pln_shift_ctl/vsync_m_d1 (FF)
Destination: core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_15[4] (FF)
Requirement: 7.575ns
Data Path Delay: 8.821ns (Levels of Logic = 7)
Clock Path Skew: 0.000ns
Source Clock: TP17_c rising at 0.000ns
Destination Clock: TP17_c rising at 7.575ns
Clock Uncertainty: 0.120ns

Data Path: core/pln_b/pln_shift/pln_shift_ctl/vsync_m_d1 to core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_15[4]
Location Delay type Delay(ns) Physical Resource
Logical Resource(s)
------------------------------------------------- -------------------
SLICE_X52Y168.YQ Tcko 0.360 core/vsync_m_d2
core/pln_b/pln_shift/pln_shift_ctl/vsync_m_d1
SLICE_X58Y260.G3 net (fanout=16) 1.737 core/vsync_m_d1
SLICE_X58Y260.Y Tilo 0.195 core/pln_b/pln_shift/pln_shift_ctl/N_81_2
core/pln_b/pln_shift/pln_shift_ctl/rd_ptr_clr_cond_i_o2_1
SLICE_X60Y279.G3 net (fanout=8) 0.875 core/N_81_0_1
SLICE_X60Y279.Y Tilo 0.195 core/pln_b/pln_shift/pln_shift_ctl/rd_ptr119_3
core/pln_b/pln_shift/pln_shift_ctl/rd_ptr119_3_2
SLICE_X60Y279.F3 net (fanout=1) 0.213 core/pln_b/pln_shift/pln_shift_ctl/rd_ptr119_3_2
SLICE_X60Y279.X Tilo 0.195 core/pln_b/pln_shift/pln_shift_ctl/rd_ptr119_3
core/pln_b/pln_shift/pln_shift_ctl/rd_ptr119_3
SLICE_X62Y303.F3 net (fanout=2) 0.863 core/pln_b/pln_shift/pln_shift_ctl/rd_ptr119_3
SLICE_X62Y303.X Tilo 0.195 core/pln_b/pln_shift/pln_shift_ctl/N_81_1_i
core/pln_b/pln_shift/pln_shift_ctl/N_81_1_i
SLICE_X64Y303.F2 net (fanout=21) 0.844 core/pln_b/pln_shift/pln_shift_ctl/N_81_1_i
SLICE_X64Y303.COUT Topcyf 0.576 core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_cry_s1[3]/O
core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_qxu_s1[2]
core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_cry_s1[2]
core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_cry_s1[3]
SLICE_X64Y304.CIN net (fanout=1) 0.000 core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_cry_s1[3]/O
SLICE_X64Y304.XMUX Tcinx 0.435 core/pln_b/pln_shift/rd_ptr1_10[4]
core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_s_s1[4]
SLICE_X63Y306.G4 net (fanout=11) 1.115 core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_s_s1[4]
SLICE_X63Y306.Y Tilo 0.194 core/pln_b/pln_shift/rd_ptr1_11[4]
core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_s_m[4]
SLICE_X63Y311.BY net (fanout=7) 0.537 core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_s_m[4]/O
SLICE_X63Y311.CLK Tdick 0.292 core/pln_b/pln_shift/rd_ptr1_15[4]
core/pln_b/pln_shift/pln_shift_ctl/rd_ptr1_15[4]
------------------------------------------------- ---------------------------
Total 8.821ns (2.637ns logic, 6.184ns route)
(29.9% logic, 70.1% route)

and i consider this is the critical path and the logic level is 7.
Could i set the constraints like this:
NET "core/pln_b/pln_shift/pln_shift_ctl/vsync_m_d1" TNM_NET = "CP_1";
NET "core/pln_b/pln_shift/pln_shift_ctl/rd_ptr*" TNM_NET = "CP_2";
TIMESPEC "TS01" = FROM:"CP_1"TO:"CP_2":7.575;

does the line above will guide the PAR tools to route this path to meet the timing?
 

PAR already knows that the requirement is 7.575ns, so it probably won't help to constrain it again. PAR is apparently having trouble getting through all that logic in only 7.575ns. However, it's getting pretty close, 8.821ns, so maybe you can help it achieve timing closure by enabling "timing driven mapping" or by enabling extra place and route effort. Those are options in ISE Project Navigator (which I don't use) and the equivalent command-line tools.

If you can break up that long combinatorial path with a pipeline register, that could solve the problem. At 7.575ns, a few levels of logic should be fine, but seven may be too many unless all the route segments are very short. When I need to go really fast, I put only one level of logic between pipeline registers.

Also, try examining the layout in FPGA Editor to see if the logic is arranged haphazardly. If you haven't done any floorplanning or placement constraints, then that could be the problem. The placer isn't very smart, it basically dumps all your logic into one big pile, moves small stuff around to improve timing, and then tries to route everything. This usually generates some long routes with too much propagation delay. Sometimes I can make significant performance gains by confining critical HDL modules to small rectangular regions, and then place those regions near each other so the data flows between them with relatively short routes. If you haven't explored these techniques, expect to spend some time experimenting and learning.
 

Oh i see... yes,i know i can use the timing driven mapping and using multiple iteration on the PAR. but all this setting will take along time for ISE to finish the task. so i would like to explore even details by finding out the cause of the problem and i hope to gain some experience on the constraints setting.

actually i did try review the FPGA editor,too bad i still not yet explore anything on it yet. yes, i did not do any area constraints at the moment and that might be the cause (which i dunno how to set area constraints).

i have some question regarding on the floorplanning and placement.
do u edit the floorplanning on FPGA Editor or using Floorplanner?
or just set the area constraints will do?

besides, "create small test projects that check the performance of a small chunk of logic" this is a good idea...but i do not how you do it? writing HDL code? or ..?? how to test the performance for a chunk of logic?

Thanks for the help and sharing your experience, echo47... :D
 

Yes, those extra-effort options do slow down the place-and-route process, but it's worth it if it helps you achieve timing closure. The Xilinx tools unfortunately do not yet take advantage of multi-processor computers, so use the fastest computer you can find. Be sure it has enough RAM, because running out of memory will cause horrible slowdown. Here are some memory recommendations:
https://www.xilinx.com/ise/products/memory.htm

Many people use floorplanner, but I've never tried it. I use FPGA Editor to estimate approximately where I want to place my critical module. I note the X-Y coordinates of that region, and then apply LOC constraints (in my HDL or UCF file) to confine the module's SLICEs, RAMB16s and MULT18X18s (or whatever the particular FPGA uses) to that rectangular region.

My test projects are just a few lines of HDL code that focus on one particular problem. For example, I might write some code that simply creates a long net with a huge fanout, and then I try various constraints, place-and-route options, and design changes, until I find a good way to achieve the required performance. Sometimes I also examine the resulting layout in FPGA Editor. It's much easier and faster to try experiments on a small test project than with my big project.

The Xilinx tools sometimes aren't super-smart, so you will occasionally need to use constraints or clever design changes to help the tools generate a layout with acceptable performance. The good news is the tools usually get a little smarter with each new version.
 

recently i had upgraded my memory to 2Gig..i think it is sufficient for now.

i think i might need to spend some times explorering the floorplanner and the FPGA editor... is there any tutorial on the xilinx website?

after setting the timing drivin mapping...it's seem the timing had been meet:
------------------------------------------------------------------------------------------------------
Constraint | Requested | Actual | Logic | Absolute |Number of
| | | Levels | Slack |errors
------------------------------------------------------------------------------------------------------
TS_pll_sdram_sys_clk_dcm_0_CLK0_BUF = PER | 7.575ns | 7.573ns | 8 | 0.002ns | 0
IOD TIMEGRP "pll_sdram_sys_clk_dc | | | | |
m_0_CLK0_BUF" TS_pll_mclk_dcm_0_CLK2X_BUF | | | | |
HIGH 50% | | | | |
------------------------------------------------------------------------------------------------------
TS01 = MAXDELAY FROM TIMEGRP "CP_1" TO TI | 7.575ns | 7.570ns | 8 | 0.005ns | 0
MEGRP "CP_2" 7.575 ns | | | | |
------------------------------------------------------------------------------------------------------

but the positive slack is really tide...only 0.002ns...will this consider ok?
 

I haven't seen any tutorials on those topics, but keep searching. They may exist, perhaps as video tutorials.

Looks like the timing-driven mapping worked fine. It is common to see timing met by a slim margin. Once the router achieves timing, it stops trying. No worries, it considers the worst case voltage, temperature, and process variations, so your device should operate reliably.

By the way, if your 7.575ns clock comes from a jittery source such as a DCM or noisy external oscillator, then you may need to specify tighter timing constraints that include the worst-case period jitter. The ISE tools automatically consider some internal factors, but not external sources. More info:
**broken link removed**
 

hum....how do i measure is the clock jitter on the FPGA clock input pin?is there any special equipment?

besides, i have a question regarding on the clock tp pad and pad to clock setting.
on the UCF, user need to set the OFFSET for each pins? or each FFS? sorry i'm abit confuse on it.
 

It's usually easiest to read the jitter value from the data sheet of whatever device is generating your clock. If it's a conventional crystal oscillator, then it probably has very low jitter, and you can ignore it.

The smaller the jitter, the more difficult it is to measure. For a quick estimate, if you have a sufficiently fast storage scope, you can simply trigger on the clock and measure the width of the fuzzy band at the following clock edge. Various manufacturers have application notes about "jitter measurement". Try a Google search for those two keywords, and you'll find some right away.

If your FPGA application requires timing margins on the input and output pins, then you should apply offset constraints to those signals. In your Constraints Guide, see the chapter "Timing Constraint Strategies", and the description of the OFFSET constraint. More info in Xilinx White Paper 237, "What are OFFSET Constraints?"
https://www.xilinx.com/support/documentation/white_papers/wp237.pdf
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top