Debugging help needed for Xilinx FPGA

layowblue · Feb 22, 2015

Hello all

I have designed a block as part of our project. The block has a sync-fifo and controller inside. However the fifo-controller read pointer misbehave when synthesized bit file is loaded to FPGA with 126MHz clock constraint, but it works perfect under 40MHz clock constraint. Notice both FPGA bit file pass timing.
ASIC simulation shows that the block's logic is correct, but I have no idea why it misbehaves with 126MHz clock in FPGA.

Probing the pointer fails the FPGA build, as the project itself is very big, and this block is the most difficult one for P&R.

Could anyone shed some light on directions as to how to continue the debugging?

Is there any conformal result/log/report in Xilinx FPGA flow to ensure the synthesized netlist is equivalent to RTL logic?

Thanks a lot!

vGoodtimes · Feb 22, 2015

I'd start with the basics first -- make sure sysmon/xadc don't report temperature/voltage errors. Likewise, if you do have high-temp you should set up the build for the higher max temp. If you interact with external IO, you may also want to examine them.

Next, I'd look over the timing report settings. Vivado can sometimes be difficult to understand. Make sure you don't have any accidental ignore's on paths within the fifo. Sometimes wildcards are used to call out signals, and then pick up other signals by mistake. IIRC, ISE also has some odd issues with case-sensitive names in the UCF.

If you have clock switching, make sure the clock is stable before, during, and after reset. Likewise, ensure that 126 MHz isn't 130 MHz+. An unstable clock can actually do things that sound impossible.

I'll assume you use BRAM for a custom fifo. If so, make sure to read the user guide for the BRAM read-write conflicts. Different generations of FPGA have slightly different requirements. IIRC, Virtex6 was more difficult.

If you use async resets, make sure you have them deassert synchronously.

If these fast checks don't give results, I'd look into removing parts of the design in order to get something like chipscope or any custom debug logic into the design. Sometimes this means adding a fake test source or fake test sink for data.

layowblue · Feb 22, 2015

Thanks vGoodtimes for the kind reply!
For the FIFO, I'm not using BRAM, cause it's causing congestion issue, then placing error. No all FIFO bits are flops.
Also, this is a sync design, single clock.
We added probes to input of the block(writing part of the FIFO), all shows correct behavior. No external IO issue.

I'll look at posisble sysmon/xadc reports. So you mean if there is any temp/voltage errors, it is not gonna be reflected in timing report?
Wildcard is a great point, I'll double check it.

Thanks again

vGoodtimes · Feb 23, 2015

The timing report assumes you don't have out of spec operating conditions. If you have a weak power supply or low cooling you may not meet timing on the actual device. Likewise, if you have any generated clocks, you should ensure they are not too fast. (eg, assuming a 66/64 design doesn't have a gearbox and run at 5/4 instead...)

This fifo should be very small -- 8-16 elements. Is this the case?

Perhaps you'll have to remove logic on one side of the fifo to get the debug info you need.

pbernardi · Feb 23, 2015

If the place&route does not shown errors, it should behave ok with 126 MHz.

Also, you should add constraints to jitter in your design. 126 MHz without any jitter is one thing (not possible in practical). When you add jitter, you clock speed goes down. Also check how stable is your clock source.

Also, you can try a different place&route and check if your design improves.

Nasser750gx · Feb 23, 2015

I have faced problems like these. Sometimes extremely difficult to trace. Hence I came up with one simple rule:

Always latch at one edge, read on the other. For example latch at rising edge, read at falling edge. This will make life easier, and you won't be needing to go after timing analysis (provided that your simulation works fine and you're certain of the work-flow).

I personally think you have a jitter problem here...

ads-ee · Feb 23, 2015

Nasser750gx said:
Always latch at one edge, read on the other. For example latch at rising edge, read at falling edge. This will make life easier, and you won't be needing to go after timing analysis (provided that your simulation works fine and you're certain of the work-flow).

I think this is bad advice. Doing this will require timing constraints and timing analysis, regardless of what Nasser thinks. Besides that the timing will be even tighter as now the relevant timing is between opposite edges of the clock, reducing your timing budget between registers by half.

Layowblue, I would like to know specifically if you are using ISE as it is not stipulated in any of your posts.
If you are using ISE do not enable logic optimization in map, for a full design with tight timing I've seen designs fail to operate correctly with no explanation. I eventually pinpointed it to the use of logic optimization in map, which it turns out has a bug that the factory knows about, but never fixed.

Nasser750gx · Feb 23, 2015

@ads-ee: This pattern has solved many problems of mine. Especially when complex state-machines were involved (One example was a microcontroller job migrated to Spartan 6). Of course, I just wanted to share the experience.

FvM · Feb 23, 2015

Always latch at one edge, read on the other. For example latch at rising edge, read at falling edge. This will make life easier.

As ads-ee mentioned this is a bad deviced for FPGA design. You find the scheme in external interfaces where the delay skew is possibly larger than setup and hold times. Synchronous serial busses are often using it. But inside a FPGA, clock networks have pretty low skew and you get typically suffcient hold time margin without constraining the circuit, just using the propagation delay of FFs.

ads-ee · Feb 23, 2015

Nasser750gx, In my experience. I've only seen a few timing setup/hold violations in designs that passed STA, that I was called in to debug (I've never seen one in my designs).

1. A design in a 3000 series Xilinx part that used gated clocks (I was called into fix the issue).
2. A design that was missing CDC circuitry.
3. A design that had a race condition in an asynchronous reset that was used to clear a condition at the end of a packet.

All three of these were primarily a problem with inadequate/incorrect constraints, poor design, or both.
#1 was both a very poor design and there were missing constraints. I had to hand place all the gated clocks LUTs so the design would pass temperature testing.
#2 was found during temperature testing, and I was called in to fix the issue. Oh, yes timing constraints all passed, I looked at the reports, but the problem was due to lack of CDC in the design in one critical place. Two synchronized FSMs were using a common signal from a different clock domain that was NOT synchronized to the clock used by the FSMs. The problem only showed up at low temperatures due to a hold time race condition.
#3 a reset was used to end a transfer and the timing constraints didn't cover the asynchronous path through the reset, timing was explicitly ignored (false path) due to problems with recovery removal errors in STA :roll:

In my designs I've always included jitter and some 2-5% extra timing margin in my designs, therefore when STA says the design passes timing, I've never seen a hidden timing problem that shows up in temperature testing (and a lot of my designs have had temperature testing done on them -55C to +125C, -40C to +100C, most of what I've worked on is industrial products).

layowblue · Feb 23, 2015

thank you all people above!
1) I understand Nasser750gx's suggestion, and I've tried it before and it worked. The idea is not to change the sampling edge boldly from posedge to negedge, but to use a clock that you have external edge control. meaning you can set a register and flip the clock, so the sampling would be changed to negedge. Thus, the timing constraint is still all for single-edge design.
However, in my current design, it is hard to do so, cause that will involve too much design changes, which would easily go wrong(I'm not in full control of all related blocks).
2) For ads-ee: We are not using ISE for synthesis, we used synplify_premier instead. Do you know of similar problem for this tool? Also, you mentioned about jitter, but I believe the tool should have default settings for jitters. I will double check with FPGA team for it, but I believe they should have set it correctly as this part is usually a script tcl used by the whole company.

Thanks a lot

ads-ee · Feb 23, 2015

layowblue said:
2) For ads-ee: We are not using ISE for synthesis, we used synplify_premier instead. Do you know of similar problem for this tool? Also, you mentioned about jitter, but I believe the tool should have default settings for jitters. I will double check with FPGA team for it, but I believe they should have set it correctly as this part is usually a script tcl used by the whole company.

It's not a synthesis switch it's the logic optimization switch in MAP.exe of ISE. If you've ever run smartxplor on the design if it selected the two opt strategies then you would have that switch enabled and would run the risk of having that bug show up. If you've never enabled the advanced map options and are using the defaults for map then you problem aren't running map optimize on the design. The only sure way is check what the reports tell you are the map options used to implement the design.

There is no such thing as a default setting for jitter. If you don't add the jitter constraint to your clock there isn't a default, and you'll be using and ideal clock.

- - - Updated - - -

For the FIFO, I'm not using BRAM, cause it's causing congestion issue, then placing error. No all FIFO bits are flops.

This statement doesn't make a lot of sense. How is BRAM causing congestion issues? A BRAM is a single IP block on the die a FIFO controller if it's designed correctly (like using the FIFO generator) should use only a small amount of logic near the BRAM.

Using Flip-Flops on the other hand is almost guaranteed to be causing severe congestion issues as all the bits of the flops will be multiplexed out to emulate a FIFO memory (unless you've made some horridly complex shift register FIFO with a huge demultiplex on the write side.

Why didn't you just use LUT RAM to implement the FIFO if there is some issue with BRAM resources?

Nasser750gx · Feb 23, 2015

@Ad-See, @FvM:

Hi Guys,

Well I need to say that I experienced this issue with Xilinx (interestingly not Altera, Lattice was also OK). What I'm saying goes against the theory, but it must be coming from a deficiency in Xilinx ISE. The design worked perfectly fine on Altera (Cyclone 4, with proper constraints), not on Spartan 6 however (with proper timing constraints). I still don't understand the reason behind it, but it's a patter I use now, and so far no problems any more. The behavior was odd, some registers failed to clock-in the data (later on discovered in Debug), while others registers were OK. My personal guess is path-delay that ISE fails to calculate correctly for some reason (or in specific situations).

Anyway, if you ever came across a complex state-machine design, and saw abnormal behavior even though the ISE confirmed the MHz and STA was OK, you may want to try this pattern

All the best,
Nasser

ads-ee · Feb 23, 2015

Nasser750gx said:
Well I need to say that I experienced this issue with Xilinx (interestingly not Altera, Lattice was also OK). What I'm saying goes against the theory, but it must be coming from a deficiency in Xilinx ISE. The design worked perfectly fine on Altera (Cyclone 4, with proper constraints), not on Spartan 6 however (with proper timing constraints). I still don't understand the reason behind it, but it's a patter I use now, and so far no problems any more. The behavior was odd, some registers failed to clock-in the data (later on discovered in Debug), while others registers were OK. My personal guess is path-delay that ISE fails to calculate correctly for some reason (or in specific situations).

Anyway, if you ever came across a complex state-machine design, and saw abnormal behavior even though the ISE confirmed the MHz and STA was OK, you may want to try this pattern

I would have to see a testcase of this behavior. I suspect it may have more to do with incorrect timing constraints (Altera (SDC) v.s. XilinxISE (UCF) constraints are apples and oranges, they might both be fruit but they sure don't look the same) or perhaps using coding that doesn't follow the general coding templates for inferring the correct logic.

layowblue · Feb 23, 2015

The design in nature excludes the possibility of using bram, it is not a simple FIFO, the physical buffer will be divided into multiple logical FIFOs under different scenarios. I can only say so much.
I'll follow up with map optimization options.

- - - Updated - - -

For optimization part, the settings we’re using are as follows:
- Synthesis
o Pipelining = 1
o Fanout_limit = 1000
o Resource sharing = 0
o Fixgatedclocks = 3
o Retiming = 0
o Use FSM explorer = 0
- Implementation (directives)
o Placer = SSI_ExtraTimingOpt
o Physical Optimizations = Explore
o Router = Explore

I'm an ASIC designer, so not fully updated to latest FPGA tools, I'll study the related doc at the same time.

- - - Updated - - -

the only warning I see fishy is:
"Starting Logic Optimization Task
Phase 1 Retarget
INFO: [Opt 31-138] Pushed 1 inverter(s) to 4 load pin(s).
WARNING: [Opt 31-143] Automatic BUFG insertion was skipped because there are already at least 12 clock buffers (BUFG and BUFHCE) using global resources.
Resolution: Manually insert a BUFG to drive the high fanout net. However, make sure to first analyze clock buffer utilization to determine if the insertion is safe to perform.
"

ads-ee · Feb 23, 2015

Once again it is NOT a synthesis option:

It is a process property of Implement Design

The person responsible for running implementation of the entire design needs to tell you what options they are using. If they use -logic_opt (the 5th option) you could be seeing the exact same issue I ran into.

- - - Updated - - -

You could also resort to generating a simulation gate level netlist and SDF. Then run a full timing simulation on the entire FPGA, perhaps that would show some issue with the register FIFO.

vGoodtimes · Feb 24, 2015

When the OP said no BRAM, I had assumed it was a tiny fifo, maybe 8 elements. But it sounds like this is not the case.

There may still be legitimate reasons why BRAM can't be used, eg, multi-write or possibly too many multi-reads that it would use too many BRAM.

layowblue · Feb 24, 2015

that's exactly the case. Great deduction

vGoodtimes said:
When the OP said no BRAM, I had assumed it was a tiny fifo, maybe 8 elements. But it sounds like this is not the case.

There may still be legitimate reasons why BRAM can't be used, eg, multi-write or possibly too many multi-reads that it would use too many BRAM.

- - - Updated - - -

Vivado has migrated ISE settings into new settings. In this case, logic_opt is included in opt_design, which is used in our current build.
http://www.xilinx.com/support/documentation/sw_manuals/xilinx2013_1/ug904-vivado-implementation.pdf

ads-ee said:
Once again it is NOT a synthesis option:

It is a process property of Implement Design
View attachment 114622

The person responsible for running implementation of the entire design needs to tell you what options they are using. If they use -logic_opt (the 5th option) you could be seeing the exact same issue I ran into.

- - - Updated - - -

You could also resort to generating a simulation gate level netlist and SDF. Then run a full timing simulation on the entire FPGA, perhaps that would show some issue with the register FIFO.

ads-ee · Feb 24, 2015

So you are using Vivado!? Why didn't you mention that after post #7, when I asked what tool you were using?

Why should I waste my time trying to help, when you won't even tell us what tool chain you're using?

Besides learning how to debug something you also need to learn how to ask a question with useful information that alows someone to help.

layowblue · Feb 24, 2015

If you are offended by my inadequate information, I apologize for it. I thought Vivado is just an evolved version of ISE.
I thank everyone sincerely for trying to help.

ads-ee said:
So you are using Vivado!? Why didn't you mention that after post #7, when I asked what tool you were using?

Why should I waste my time trying to help, when you won't even tell us what tool chain you're using?

Besides learning how to debug something you also need to learn how to ask a question with useful information that alows someone to help.

Welcome to EDAboard.com

Debugging help needed for Xilinx FPGA

Advanced Member level 4

Advanced Member level 4

Advanced Member level 4

Advanced Member level 4

Full Member level 3

Member level 1

Super Moderator

Member level 1

Super Moderator

Super Moderator

Advanced Member level 4

Super Moderator

Member level 1

Super Moderator

Advanced Member level 4

Super Moderator

Advanced Member level 4

Advanced Member level 4

Super Moderator

Advanced Member level 4

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor