Metastability in flops not too many engineers can answer.

rakko · Dec 5, 2010

When engineers are asked about metastability in the interviews. They all recite what they learned or they think the learned from a published paper. There are lots of published paper in the last five decades on this subject but not any of the papers push deep enough to answer a fundamental quation and that is how does a double synchronizer recovers from a metastable condition? Just because you put two flops back to back does not mean that it is not possible for the data to get corrupted once a metastable condition is encountered. Here is an example;
The 1st flop of a double synchronizer goes metastable for the reason we all understand. If this flop is not recovered in time(typically 1/2 clock time), then there are three posibilities; 1- the flop recovers to correct state. 2- the flop recovers but ends up in the wrong state. 3- the flop still has not recovered from metastability before the next clock. In the 1st condition, we were lucky. In 2nd condition, we corrupted the data. In the 3rd case, we pass the metastable to the 2nd statge and potentially to all other flops down the stream. All published litrature says the purpose of the 2nd flop is to maximize MTBF and assume that the metastable data of the 1st stage has recovered (as in case 1, above) by the time it reaches the 2nd synchronizer flop. This is a condition that can not be assumed and is not explained in any litrature. Can you explain it?

urmish · Dec 5, 2010

we basically reduce the probability of occurrence of metastability by using double flip flops back to back. If you look at the MTBF(inverse of probability. Well not exactly but will be good for answering your question)equation for a synchronizer, you can easily verify for yourself that the chance of occurrence of a metastable state in a synchronizer will be much less than that of a single flop where data comes from 1 clock domain (source clock domain) the clock comes from the destination clock domain. Also, the amount of time a flop output will be in metastable state has lot to do with quantum mechanics. I am not an expert in that. Also, metastable hardened flip flops are used (typically in arbiters) to reduce the chance of such an occurrence to 10^(-21) to 10^(-18 ).
I heard the above statement from my college professor.

permute · Dec 5, 2010

basically, look at it this way:
there is a very narrow range of conditions that would cause an issue with the first FF. Of these, most will cause the output of the first FF to change over the next cycle. The results don't need to be perfect -- they only need to change by enough to allow the second FF to be able to transition. Thus the range of inputs that would cause issues with the chain of FF's is much more restrictive then the range of inputs that would cause problems with a single device.

add a third device into the chain, and the range of inputs at the input of the chain that cause issues after the 3 FF's is even lower. same with 4, 5, ect... The problem isn't actually solved -- it's just made extraordinarily unlikely to have issues. If you get a estimate of 1 error per billion years, then the design has a high probability of working for 1 year without any issues.

2 FF's is common because it often gives suitable reliability, is a bit lower cost/area/power, has lower latency than 3+ FFs, ect...

rakko · Dec 5, 2010

thanx to urmish and permute for the reply. I understand what you say but the fact still remains that under worst case condition, it is possible for the 1st flop to go metastable and not recover before the next clock edge. If this happens the input to the 2nd flop is metastable and we end up corrupting the data. I know the probebality is slim but it still can happen and happen at anytime. You can not design reliably if you are going to take chances like this. I want to re-make a point which I tried to make in my original post and that is if the 1st flop does not recover during the positive half of the clock, you have at least a 75% chance of not recovering at all no matter how many flops you chain together. I say this because if the 1st flop resolves to wrong data, then you have a corrupt bit which makes your design worthless. I don't know about you but I trust the power of probebality just as far as I can throw it. I don't consider probebality when I design but only certainty. Probebality just says the chances are slim, not that it can not happen on the next clock.
I love to hear from analog designers especially those who have designed flops or written flop models for libraries.

FvM · Dec 5, 2010

You consideration shows, that you neither understand the nature of metastability nor the problems involved respectively not involved with it.
The first wrong assumption is, a FF would possibly go to a wrong state. You are sampling a transient signal. Depending on the sampling moment, you'll either get the previous or the new state, which would both correct. But if you exactly meet the edge, there is a finite chance of getting an intermediate value exactly at the logic threshold, causing a metastable state.

Why both states are correct? The signal is sampled periodically, if you don't get the new state with the present sample, you'll get it with the next. As already explained by permute, by resampling the first synchronizer's output in a second one, you dramatically reduce the chance, that a metastability, that atually lasts for one full clock cycle, will be present in the final signal. As with the first synchronizer, both possible output states would be correct in case of an input transient.

I believe, that your doubts are related to a misunderstanding of the basic metastability problem. So I don't need to argue about probability of metastability induced design faults. I guess that you would be satified with 10^-21 or similar.

A final remark, because you addressed lack of knowledge about metastability. Most design faults, that metastability is blamed for are actually simple cases of providing no synchronization for asynchronous signals at all.

rakko · Dec 6, 2010

FvM, it seems like you are trying to simplify things just to make a point or you totally missed the point. First of all, we are not sampling a transient signal but a signal that is ramping up as the clock is ramping up. The metastability is due to failure of the sample/hold comparator's input being equal at some point during this transition. When this happens, depending on which input becomes more positive first (clock or data), the output of the comparator could go either way to positive or negative rail, or it could oscillate for ever or be unknown assuming the input of the comparator is not driven by D input of the flop which is the case during low periods of the clock. So, we can have unknown. I don't get the transient point you are trying to make either. It is transient only to a digital engineer. In reality the output of the 1st flop can be unknown for a clock or more. If this happens no amount of flops can help... I guess there are no analog engineers on this forum. I ignored the rest of your reply because you are getting too emotional or they had nothing to do with the problem I posted...

lostinxlation · Dec 6, 2010

I actually don't get what the point of your doubt is. If you dont' have double or triple flopping and the flop is immediately followed by the comb logic that ends up in many flops, the metastable condition would spread out in the logic and a lot of flops down the paths would be unknown state that you want to prevent. Having a double or triple flopping back to back without any logic in between can give the 2nd or 3rd flops the best chance of recovery from metastability within a cycle in case the 1st flop went metastable and reduce the probability that the metastable condition gets passed to the subsequent logic. You mentioned that you'd like to design for certainty, but 10^-21 is quite certain, isn't it ?
You can't make everything work for certainty. The plane could crash due to the malfunction of the system, but having the redundant system reduces the probability of system failure. The steering of your car may get broken whie you are driving at 100mph. It's all about probability and as long as the probability is low, it should be considered reliable system.

permute · Dec 6, 2010

well, no, its not impossible for the synchronizer to fail. The addition of the 2nd, 3rd, ect... FF is to dramatically reduce the failure rate. eg, from 2^-21 to 2^-42 or 2^-63. The MTBF can be determined based on the probability of failure per event, and the estimated number of events that could fail per second. For async clocks, this should be a fraction of transitions per second, and is also based on the probability of transition.

2^-42 would mean around 1 failure per 4 trillion events. for a given design, it might be common for potential issues to occur on 1% of clock cycles, and if the transition density is 50%, then you would have around 1 failure every 800 trillion cycles. at 1GHz, this is one failure per 800k seconds, or one failure per 9.25 days. Adding a third FF would improve this to one failure per 53kYears. or one failure per century in a design that uses 512 synchronizers.

rakko · Dec 6, 2010

Point well taken permute but my question is more fundamental. Lets say the 1st flop goes unstable and produces an unstable output that extends to the next edge of clock when the 2nd flop samples the input. Sampling an unstable input will cause the 2nd one to go unstable too. Now, the MTBF that everyone so affectionately mentions, says that the possibility of the two back to back synchronizer flops going meta-stable based on a good input is slim but based on a bad input at either flop as in my example is 100% failure for sure. This is not a function of double synchronizers but a function internal to the design of the flip-flops. This brings me to my point that for certain cases (my case), double synchronizers don't do any good. I suspect the answer to my question be two things; 1- I am correct in my assumption and synchronizers are worthless in this case or 2- Some flip-flop designer can prove that this is not possible internal to the flops

FvM · Dec 6, 2010

Sampling an unstable input will cause the 2nd one to go unstable too.

This statement is wrong. As I said, you didn't understand the nature of metastability.

The correct statement would be: If the first synchronizer is in a metastable state, than the second can go to a metastable state, but the probability is very low.

Mathematically spoken, it's the difference betweeen necessary and sufficient condition.

P.S.: I try to understand, which kind of flip-flop design you imagine, that would have the bad property to "copy" an unknown state from the input signal. The usual FF circuits don't have it. If you analyze them as analog circuits, the gain of the positive feedback loop is so high, that it's nearly impossible to cause an intermediate output state by a feeding a particular input voltage.

urmish · Dec 6, 2010

rakko said:
thanx to urmish and permute for the reply. I understand what you say but the fact still remains that under worst case condition, it is possible for the 1st flop to go metastable and not recover before the next clock edge. If this happens the input to the 2nd flop is metastable and we end up corrupting the data. I know the probebality is slim but it still can happen and happen at anytime. You can not design reliably if you are going to take chances like this. I want to re-make a point which I tried to make in my original post and that is if the 1st flop does not recover during the positive half of the clock, you have at least a 75% chance of not recovering at all no matter how many flops you chain together. I say this because if the 1st flop resolves to wrong data, then you have a corrupt bit which makes your design worthless. I don't know about you but I trust the power of probebality just as far as I can throw it. I don't consider probebality when I design but only certainty. Probebality just says the chances are slim, not that it can not happen on the next clock.
I love to hear from analog designers especially those who have designed flops or written flop models for libraries.

it would be naive to not consider how the chances to chip failing due to metastability issues are so low that it would be worthwhile to look into other aspects of design and manufacturing. Nothing in a chip design is perfect. Even in DFT we learn that coverage of 100% is an NP complete problem or maybe not even solvable since we don't have a fault model which models every kind of defect there occurs. Consider reliability. I am sure you must have heard of the bath tub curve. There is a VERY VERY high chance that your chip might fail due to reasons other than metastability (when MTBF for clock domains is in millions of years). Engineering is an approximation. Give a problem the time and effort that it requires (from a practical point of view).

engrMunna · Dec 6, 2010

while you guys are on the topic of MS....can you please help me understand that in some cases of MS the output keeps on oscillating why is that so? I understand that it can be zero or one, but why does it keep on oscillating?

rakko · Dec 7, 2010

The input to a flop is actually connected as an oscillator with a switch at it's input. To see this, look at the SR flop schema above and if you ignore the SET connection you should see this. As mentioned before if the flop goes metastable, the output of this oscillator could swing to positive and negative rails a few times before it settles down. It only settles down if the control system consisting of the feedback, the two nand gates, and the switch are designed to be over-damped. If however for any reason, the system is designed or becomes critically-damped, then one swing of the output is enough to put the system into an indefinite oscillation. This happens because the feedback goes one way when the output of the oscillator goes another way and visa-versa. So they endlessly toggle each other. Most flops are designed to be very damped so this happens very rarely and if it happens, the circuit quickly settles down but theoratically possible.

FvM · Dec 7, 2010

The "logic loops" of a flip-flop (there are two, one in the master and one in the slave section) have a positive feedback factor, but a gain >> 1 down to frequency 0, so the oscillation condition is generally not met. I haven't seen a flip-flop design that would be able to produce maintained oscillations, and obviously, no one would design it this way. The standard design uses two inverting nand and a transmission gate, which can be represented by two poles of almost equal time constant, and an additional pole of smaller time constant for the transmission gate. You'll have difficulties to produce even a damped oscillation with this hardware block.

I don't want to claim, that it's impossible to get oscillations of longer duration in metastable state, at least it's possible with an unsuitable design. But the waveforms shown in literature suggest, that you ususally won't see it.

std_match · Dec 8, 2010

rakko said:
Point well taken permute but my question is more fundamental. Lets say the 1st flop goes unstable and produces an unstable output that extends to the next edge of clock when the 2nd flop samples the input. Sampling an unstable input will cause the 2nd one to go unstable too. Now, the MTBF that everyone so affectionately mentions, says that the possibility of the two back to back synchronizer flops going meta-stable based on a good input is slim but based on a bad input at either flop as in my example is 100% failure for sure. This is not a function of double synchronizers but a function internal to the design of the flip-flops. This brings me to my point that for certain cases (my case), double synchronizers don't do any good.

There is a misunderstanding here. You are correct that the output from the second FF can be "random" if the first FF is metastable too long. Of course, this "random" value will propagate to the output, no matter how many syncronizers there are. As FvM explained, a "random" value is not a problem. Both values are correct since the setup time was violated. If the input is stable until the next clock cycle, the next value will not be random. This will happen all the time with an asyncronous input, and it will only cause a random delay of the edges. No data will be corrupted.

The "real" problem is if the metastable condition propagates. The probability is reduced for each syncronizer stage, even if the data value is random through all the syncronizers.

A random value is '0' or '1', but a metastable condition is a completely different thing.

jayh · Dec 12, 2010

i want to talk some with another point.
in my opinion, it is better to know the internal situation of circuit for the cause of Metastability. with a unstable input at edge of clock FF is possible to be unstable. its output maybe unstable and toggle in the middle level range. but it becomes stable(either high or low level) finally(recovery). you can see this from simulation with "analog design" of FF with more details.

the point is recovery time. how long does it last for. i think it is a process of charge and discharge in the circuit. the recovery time vary with transistor structure, wire, capacitance......relevant to technology of design. just for example, if 10ns recovery time usually for .18nm process. and if the cycle of clock in design is 100ns, the input become stable before the next cycle because one cycle is much longer than recovery time. then two synchronizer FF is OK. how about faster clock(20ns or 15ns or 10ns)? of cause it is possible that the unstable output from the first FF result in unstable state in the second FF(Metastability propagation) theoretically. so maybe you need the third synchronizer or more?

the point is estimation of recovery time with circuit based on your technology in my opinion not implementation with second synchronizer simply.

FvM · Dec 12, 2010

the point is estimation of recovery time with circuit based on your technology in my opinion not implementation with second synchronizer simply.

The two synchronizer's suggestion is a practical one, valid for a certain technology and clock speed range. Metastability MTBF is a statistical quantity, but it can be determined based on properties as you mentioned it. Some programmable logic tools, e.g. Altera Quartus, offer a metastability analysis option, that calculates the expectable MTBF for a specified circuit.

rakko · Jan 5, 2011

FvM, you keep missing the point. The whole idea of the post was a detail discussion of this issue. All you are doing here is to quote some textbook without discussing the issue any further or providing proof or supporting documents to help us understand the relevance of your point of view.

Having said this, I think the second to last post understood what my beef with the double synchronizer was when he said
"If the input is stable until the next clock cycle, the next value will not be random. ".

I agree, exactly my point You just can't put a double synchronizer and wash your hands off. What happens if the incoming data is changing on every clock cycle? In this case, the output of the 1st receiving synchronizer flop is at best random and you can not count on the same input to exist in the sending domain during the next clock cycle. In other words we are corrupting the data. It seems like the only way to avoid this is by design of some type of over-sampling.

sakshi gupta · Jan 5, 2011

I think if we have multiflop synchronizer , it metastability is not resolved in First flop of the synchroniser, it will be resolved in the 2nd or 3rd synchroniser flop & only new or the previous value is passed at the output of synchroniser .

I have one query how we can decide based on the technology that whether :?:we require 2 or multiplop synchroniser to resolve metastability issues ?

lostinxlation · Jan 5, 2011

rakko said:
I agree, exactly my point You just can't put a double synchronizer and wash your hands off. What happens if the incoming data is changing on every clock cycle? In this case, the output of the 1st receiving synchronizer flop is at best random and you can not count on the same input to exist in the sending domain during the next clock cycle. In other words we are corrupting the data. It seems like the only way to avoid this is by design of some type of over-sampling.

That's nothing new. The data crossing the async clock boundary must be always oversampled by asynchronous nature.. I mean when you send the data to other clock domain, how can you let the receiving block capture the data without oversampling ? However, there are some cases where oversampling is not used. One of the exceptions is domain crossing of pointers in fifo.

Welcome to EDAboard.com

Metastability in flops not too many engineers can answer.

Full Member level 4

Newbie level 5

Advanced Member level 3

Full Member level 4

Super Moderator

Full Member level 4

Advanced Member level 2

Advanced Member level 3

Full Member level 4

Super Moderator

Newbie level 5

Advanced Member level 4

Full Member level 4

Super Moderator

Advanced Member level 4

Newbie level 6

Super Moderator

Full Member level 4

Full Member level 1

Advanced Member level 2

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor