Syncronous FIFO - flag generation

K-J · Jul 18, 2014

shaiko said:
ads-ee,

I followed your suggestion and started learning the design in this article:
**broken link removed**

I see the in page 18, that the empty flag is asserted/deasserted combinatorially outside of a clocked "always".

Code:

assign rbinnext = rbin + (rinc & ~rempty);

My question:
If it's good for an asynchronous FIFO, why isn't it good for a synchronous FIFO?

I think you're taking this a bit out of context. The sole reason for producing synchronous output flags is because those flags will get used with other logic in whatever application is using the fifo. Everything else being equal (and I realize that is never really the case), if the critical timing path is in that logic that uses the flags, then the max clock frequency of a design that generates the flags with logic will be lower than one where the flags come directly out of a flip flop.

Fifo flags can end up in that type of situation because they will typically get used to directly generate flow control logic. One view of something being 'Full' is that it is no longer ready for input; similarly, 'Empty' means that there is no output ready. A data processing algorithm typically consists of several modules strung together in some fashion into a processing pipeline. If one module is not ready for input, that will then tend to cause upstream modules to have to stop sending data which will in turn cause other upstream modules to pause as well.

That's the long winded explanation.

There are many situations where there is a critical timing path and it simply has to come out of logic and not a flip flop so you could ask 'What is different about that case'. The answer there is that you would always like the status to come out of a flip flop (to improve Fmax again) as long as that doesn't break functionality or degrade performance for other reasons. The specific case of the fifo is an example of something where it is straight forward to produce clocked output status at little or no cost in terms of logic resources used.

So, given that you can produce a clocked status at little or no cost, and that producing a clocked output will improve Fmax, the question you should ask is 'Why would you not design something that has better Fmax at little or no logic resource cost?'

To answer the question why the Cummings paper does not produce such an output, I don't know. It would be an improvement if it did, maybe Cummings didn't see a way to produce the flags that way or simply didn't consider the fact that it would be an improvement in the first place.

To bring it all back to what you are attempting though, consider the following:

Single clock fifo that you proposed: Try synthesizing your design as a standalone entity as you wrote it comparing the pointers directly for equality and check what the clock to output delay of the 'Full' and 'Empty' flags are. Also take note of the logic resources used. Then implement it as I suggested where you check to see if the pointers are one away from the full and empty conditions with the clocked logic that I showed and again check to see what the clock to output delay for 'Full' and 'Empty as well as logic resource used. [1]

Feel free to try to hand place/route or whatever other tricks you might want to employ to both designs in order to improve performance. I'll ask you now to post those results here in this forum simply because there is a lot of talking going on and really nothing specific to compare. Actual synthesis results tend to dispel gum flapping (which I accept is what I'm doing since I haven't posted any similar results...but that's because you're the one with the deep interest in fifos, not me. I'm speaking from 20+ years of past design experience that includes fifos as well as all kinds of other stuff).

Dual clock fifo:You strongly implied (or maybe outright stated...or maybe I just interpreted it that way) that the dual clock fifo would compare the read and write pointers directly. If you really were intending that only pointers within the same clock domain would be compared, then you can ignore the rest of this paragraph. Comparing the raw pointers as generated from within their respective clock domains would be completely useless anywhere except in a simulation environment. The reason is that those flags would not be suitable for use by external logic no matter what clock domain they were in. Even worse there would be nothing that could be done external to the fifo. Synchronizing the flags externally would not help, you could never guarantee any timing margin.

Another resource to Google for would be for fifo related work by Peter Alfke. Peter (deceased) was pretty much the original grand guru of fifo design and the creator (I believe) of the original pointer based dual clock fifo a couple decades back.

Kevin Jennings

[1] Also, very important, make sure that those two actually implement the exact same fifo functionality by simulating and checking that at every clock tick the two fifo designs are producing the exact same output. It seems trivial, but if they are not producing the exact same outputs, then it may not be a fair comparison since the two designs are not doing the same thing.

shaiko · Jul 18, 2014

K-J,
Thank you for such a comprehensive response.
I'll post the synthesis results as soon as I finish all the designs and categorize them by different flag generation schemes.

You strongly implied (or maybe outright stated...or maybe I just interpreted it that way) that the dual clock fifo would compare the read and write pointers directly.

Yes I did.
What I meant by "directly" is: without direct dependency on read / write requests - avoiding the need to synchronize the requests themselves in case of an asynchronous FIFO.

K-J · Jul 18, 2014

shaiko said:
What I meant by "directly" is: without direct dependency on read / write requests - avoiding the need to synchronize the requests themselves in case of an asynchronous FIFO.

Then consider the following code for 'Full'. Only the write request is used which will already be in the correct clock domain. There would be similar code for 'Empty' which only uses the read request. The pointer that is used from the 'opposite' clock domain (i.e. the 'read_pointer' below) is assumed to be the read pointer resynchronized into the write clock domain so that it can then be used here. You wouldn't use the read pointer from the read clock domain directly here.

Note that while the code below could also be used for a single clock fifo as well, there would be an additional clock cycle of latency on clearing 'Full' and 'Empty' as compared to the code that I posted earlier that was specifically for a single clock fifo.

Code:

if rising_edge(write_clock) then
   if (Reset = '1') then
      Full <= '0';
   elsif ((read_pointer-write_pointer) <= 1) then -- Could also use =1 rather than <= 1
      if (write_request = '1') then
         Full <= '1'
      end if;
   else
      Full <= '0';
   end if;
end if;

Kevin Jennings

shaiko · Jul 18, 2014

Thanks for the example.

Assuming asynchronous clock domains - what are the pros & cons when compared to Cumming's design?
**broken link removed**

ads-ee · Jul 18, 2014

The pro is registered flag outputs, the con is possibly increased latency in the flag.

Note the comment "-- Could also use =1 rather than <= 1" probably shouldn't be done with asynchronous FIFOs as there is a potential problem with comparing to exactly 1 as the pointers may skip counts when transferred to the opposite clock domain.

Regards

shaiko · Jul 19, 2014

The pro is registered flag outputs

Code:

always @(posedge rclk or negedge rrst_n)
if (!rrst_n) rempty <= 1'b1;
else [COLOR="#FF0000"]rempty <= rempty_val;[/COLOR]
endmodule

The flags here are also registered...aren't they?

ads-ee · Jul 19, 2014

yes it's registered (I mistakenly assumed otherwise on my previous post).
What K-J suggests is to generate the flag in the register as it is going empty/full not after it's already empty/full. This would improve flag latency.

It appears that the Cummings design already does this "look ahead", by using the rgraynext and wgraynext values in the associated compares for rempty and wfull.

Regards

shaiko · Jul 19, 2014

It appears that the Cummings design already does this "look ahead", by using the rgraynext and wgraynext values in the associated compares for rempty and wfull.

Yes. It also has has the "side effect" of removing the "Full" and "Empty" flags with a latency of one clock (when used with synchronous clocks) - exactly as the Cummings design.

K-J · Jul 19, 2014

ads-ee said:
Note the comment "-- Could also use =1 rather than <= 1" probably shouldn't be done with asynchronous FIFOs as there is a potential problem with comparing to exactly 1 as the pointers may skip counts when transferred to the opposite clock domain.

Nope. While the read pointer in the write clock domain can 'skip' and therefore make the amount of used space in the fifo jump as well, there is nothing that the read pointer could do that would allow the computation of the amount of available space to jump from >1 to <1 so it is not possible that you could have 2 available on one write clock cycle and 0 on the next. So the choice of using <= or = is somewhat arbitrary. Using <= might use a small amount less logic or routing since it means that you would not need to compare the least significant bit. A similar argument applies to 'Empty' in the read clock domain.

Kevin Jennings

Welcome to EDAboard.com

Syncronous FIFO - flag generation

K-J

Advanced Member level 2

shaiko

shaiko

Advanced Member level 5

K-J

Advanced Member level 2

shaiko

shaiko

Advanced Member level 5

ads-ee

Super Moderator

shaiko

shaiko

Advanced Member level 5

ads-ee

Super Moderator

shaiko

shaiko

Advanced Member level 5

K-J

Advanced Member level 2

shaiko

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics