Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

[SOLVED] Vivado hold (WHS) timing failure. Is my RTL code flawed or am i lacking constraints

Status
Not open for further replies.

wtr

Full Member level 5
Joined
May 1, 2014
Messages
299
Helped
29
Reputation
58
Reaction score
25
Trophy points
1,308
Activity points
4,108
Hello all,

Unfortunately the work is company_ip/classified so I can't post it all however let me briefly describe the following design.

I have a pretty elaborate clocking scheme.
20MHz in to MMCM_1 which generates 100MHZ, 50MHz, 20MHz & clock_lock(global reset) out.
100Mhz & unique_timed_reset into MMCM_2 which generates 300MHz & clock_lock1
100MHz & unique_timed_reset2 into MMCM_3 which generates 64MHz & clock_lock2
64MHz & unique_timed_reset3 into MMCM_4 which generates 40.9MHz & clock_lock_3.

This results in a sequenced start up. The syntax would look something like

Code VHDL - [expand]
1
2
3
4
5
6
7
8
9
10
clk_in_ibuf : IBUF port map (I => clk_in,  o => clk_in1);
 
mmcm_1 : mmcme2_adv
 generic map(... blah..)
 port map(... blah... clkin => clk_in1, clkout0 => clk100mhz, clkout1 => clk50mhz, clkout2 => clk20mhz);
 
mmcm_2 : mmcme2_adv
 generic map(... blah..)
 port map(... blah... clkin => clk100mhz, clkout0 => clk300mhz);
--etc



Constraints used
create_clock -period 50.000 -name clk_in_to_sysclk [get_ports clk_in]
..some jitter constraint.

This is the only timing constraint I specify. The MMCM is instantiated in RTL & therefore Xilinx does not have the xci ip generated xdc's.

Previously I set_false_paths & my firmware worked on 3/4 boards, & 4/4 when rerouted....yet fails thermal.
Internally I deal with cdc as can be seen by the S_drive_we being retimed.

Below is an example of the one hold WHS error I'm getting. I have an block which uses the 20Mhz clock domain & the 100Mhz clock domain.


Code VHDL - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
some_entity : blah port map (clk100mhz => clk100mhz, clk20mhz => clk20mhz ..blah);
 
-- within there is a section where I try to detect falling edge
 
if rising_edge(clk100mhz) then
  if clk20mhz_old='1' and clk20mhz_reg='0' and drive_we='1' then
    data_to_drive_we <='1';
  else
    data_to_drive_we <='0';
  end if
clk20mhz_reg <= clk20mhz;
clk20mhz_old <= clk20mhz_reg;
drive_we       <= s_drive_we; --where s_drive_we was assigned in 20mhz domain.
end if;
end process;



I know it's bad practices to use the clock tree. However I don't know how to assign the clk20mhz_reg without sampling the clk20mhz.

Anyway, long story short, after all this I get a hold warning, where

from ../MMCM_1/clkout2(20mhz) to ../../some_entity/clk20mhz_reg

results in -0.620 slack
 

I was fortunate because my design has a 20mhz clk which is sync'd to the 100mhz clock.

Therefore by generating a count based on the 100mhz I was able to detect a falling edge (ratio 3:2) and pulse data to drive for 1 (100mhz) clk cycle. <- THIS FIXED my hold error

However I'd be keen to know how I can take a clock tree & generate a data signal that mirrors it.
 

Unfortunately the work is company_ip/classified so I can't post it all however let me briefly describe the following design.

I have a pretty elaborate clocking scheme.
20MHz in to MMCM_1 which generates 100MHZ, 50MHz, 20MHz & clock_lock(global reset) out.
100Mhz & unique_timed_reset into MMCM_2 which generates 300MHz & clock_lock1
100MHz & unique_timed_reset2 into MMCM_3 which generates 64MHz & clock_lock2
64MHz & unique_timed_reset3 into MMCM_4 which generates 40.9MHz & clock_lock_3.

Based on just this clocking scheme alone, I would probably avoid buying this IP core.
Xilinx does not recommend feeding a MMCM with the output of another MMCM, but daisy chaining the clocks into a 3rd MMCM is definitely not recommended.

You better enable maximum input clock jitter filtering, depending on the targeted part the jitter could easily be 200+ps for the output clocks and the 64 MHz output clock jitter will likely be around 300ps. Virtex is somewhat better, but you better make sure that you generate each MMCM using the jitter of the previous MMCM outputs as the input reference clock jitter. I can envision violating hold/setup in some instances over a large sampling of devices if you don't account for the effects of the added jitter on the input clock of each daisy chained MMCM. I'd recommend finding a better starting frequency that has a nice multiplication factor that will allow the clock dividers to all work in the same MMCM to produce all the output clocks.


However I'd be keen to know how I can take a clock tree & generate a data signal that mirrors it.
In Verilog (I'm too lazy to write it in VHDL ;-))

Code Verilog - [expand]
1
2
3
always @ (posedge clk) t1 <= ~t1;
always @ (negedge clk) t2 <= t1;
assign clk_mirror = t1 ^ t2;


In case you have a problem seeing the waveforms:
Code:
  __    __    __    __    __ 
_|  |__|  |__|  |__|  |__|  |
  _____       _____       __ 
_|     |_____|     |_____|  
     _____       _____       __ 
____|     |_____|     |_____|  
  __    __    __    __    __ 
_|  |__|  |__|  |__|  |__|  |
 
  • Like
Reactions: wtr

    wtr

    Points: 2
    Helpful Answer Positive Rating
Thank you.

For other users who only speak vhdl the translation is

Code VHDL - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
process(rst, clk) is
begin
  if rst='1' then t1<='0';
  elsif rising_edge(clk) then t1<=not t1;
  end if;
end process;
process(rst, clk) is
begin
  if rst='1' then t2<='0';
  elsif falling_edge(clk) then t2 <= t1;
  end if;
end process;
clock_mirror <= t1 XOR t2;



From now on I will always use a method like this instead of sampling a clock tree signal as if it were a data signal.

With regards to Jitter, all I can see in the MCMM instantiation is a generic Ref_jitter1. I currently have it set (but default) to 0.010. It says in UG472 "This attribute is for simulation purposes only". Please can you expand on what you mean. I was hoping the internal inputs into mmcm_2, 3, etc are relatively clean re. jitter.
 

With regards to Jitter, all I can see in the MCMM instantiation is a generic Ref_jitter1. I currently have it set (but default) to 0.010. It says in UG472 "This attribute is for simulation purposes only". Please can you expand on what you mean. I was hoping the internal inputs into mmcm_2, 3, etc are relatively clean re. jitter.

Interesting, I just took a look at the Table 3-7 MMCM Attributes and it does say simulation only, which doesn't explain why the tools produce a different UCF for the MMCM with different jitter numbers for the clock depending on what is entered for the REF_JITTER1 or REF_JITTER2 attributes.

The output of the first MMCM and the daisy chained second MMCM are going to have output jitter, or haven't you looked at the wizards summary of the output clocks?

From the UG472 pg80:
cascading uses some of the limited resources available in the CMT backbone to connect clocking resources directly in adjacent regions. A phase offset between the cascaded elements within the same column will also result.

Also check out UG472 pg92-93 for the best possible cascade configuration for least amount of jitter. The "uncompensated delay" in the bottom figure is not a good thing for STA after PAR.
 
  • Like
Reactions: wtr

    wtr

    Points: 2
    Helpful Answer Positive Rating
I don't undertand what the simple hold time violation problem in post #1 has to do with cascaded MMCM blocks (PLLs).

Using clock as data is known to bring up timing issues. Whether the motivation for the construct is somehow given by the cascaded PLL topology or not, synchronizing a 20 MHz clock to a 100 MHz domain is a rather basic problem and can be handled by synchronizers with accompanying false path (or similar) timing constraints.
 

Wasn't implying that the problem was due to the cascaded clocks, just that unless all the jitter is accounted for you may see other problems crop up in the future after building 1000s of units, when just the right combination of process, voltage, and lack of timing margin hit you square in the face. If the OP is considering the outputs of the cascaded MMCMs to not be synchronous to the 1st MMCM's outputs and they used proper syncrhonization techniques between clock domains then there isn't going to be a problem.

The original problem was likely directly due to the sampled 20MHz clock from the output of the MMCM and I already gave a suggested method of mirroring a signal that behaves like the clock but can be sampled (or at least PAR can adjust placement to meet timing).

FvM perhaps you've had the luxury of not having to work on really really bad designs done by less then competent people, but I've practically made a career out of fixing bad designs :-(. Things like no constraints, bad constraints, missing constraints, ignored violated constraints, no synchronization between clock domains, multiple synchronizers on a signal that are used in different places, no synchronizer on a signal used in two places, gated clocks (in an FPGA!), fabric clocks, asynchronous logic with latches and edge trigger one shots,..., and list goes on...
 

The cascading mmcm's was just an avenue of curiosity.

I do use proper synchronization techniques between the 40.96Mhz (UTC) from 3rd mmcm & the 100MHz or 20MHz from 1st mmcm.

The hold violation was caused by


Code VHDL - [expand]
1
2
3
4
if rising_edge(clk100) then
  clk_mirror <= clk20;
  clk_mirror_old <= clk_mirror;
end if;



Using the code in #4 mitigated this problem.

Previously I used false paths & this hide the problem...until one unit did not function the way expected....such that it could communicate on a vme bus (ops 20Mhz), but could not capture data on the vme bus...this is because the write enable (active for 1 100mhz clk cycle) was somehow corrupted.

I need you on speed dial ads-ee. I think my vhdl is ok...but my use of tools and constraints needs to improve ..(only been firmware engineer for a year and a bit)
 

Previously I used false paths & this hide the problem...until one unit did not function the way expected....such that it could communicate on a vme bus (ops 20Mhz), but could not capture data on the vme bus...this is because the write enable (active for 1 100mhz clk cycle) was somehow corrupted.

Setting false path (= treating the 20 MHz as unrelated clock) alone doesn't solve the problem, because the logic doesn't implement safe synchronisation. Even with correct synchronisation you may get the unwanted effect that the 20 MHz pulse is jumping by one 100 MHz period. In so far, fixing the timing violation and keep the path under timing analysis control is probably the better way.
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top