Ping pong effect in RTL compilation

amitk3553 · Jun 13, 2013

Hello all,

Please tell me what does mean by ping pong in RTL compilation.

THanks!!!!!!!!!!!!!!
Regards
cam

sherif123 · Jun 13, 2013

I think this helps

http://www.asic.co.in/DesignGuidlinesRTLcoding.htm

Do not design, with other modules, signals that combinatorially bounce from once module, back to another, then back again.

WHY: Ping-Pong signals create long layout-dependent timing paths.

onion2014 · Jun 14, 2013

sherif123 said:
I think this helps

http://www.asic.co.in/DesignGuidlinesRTLcoding.htm

Do not design, with other modules, signals that combinatorially bounce from once module, back to another, then back again.

WHY: Ping-Pong signals create long layout-dependent timing paths.

I still can not understand this pingpong effect. would anyone can post an example here? and more, why it create long layout-dependent timing path?

This forum is sooooo good, learn a lot here. will recommend my friends to join this forum.:grin::grin:

K-J · Jun 14, 2013

amitk3553 said:
Please tell me what does mean by ping pong in RTL compilation.

It means absolutely nothing to anybody with even a little bit of experience in design. Don't bother researching 'ping pong' and RTL...you would get far more benefit from playing a game of ping-pong. Not trying to be rude, just trying to save you time that can be better spent.

Kevin Jennings

amitk3553 · Jun 14, 2013

K-J said:
It means absolutely nothing to anybody with even a little bit of experience in design. Don't bother researching 'ping pong' and RTL...you would get far more benefit from playing a game of ping-pong. Not trying to be rude, just trying to save you time that can be better spent.

Kevin Jennings

we need to register all Outputs(Means output should come from registers) to prevent ping-pong effects!!!!!!!!!

So want to know exactly what is ping ponging in RTL which we have to prevent form ping ponging to achieve good synthesis results. I think I cannot understand this concept by just playing game of ping pong!!!

So please give some brief idea to me about this!!!

Thanks with Regards
cam

TrickyDicky · Jun 14, 2013

Ten years doing rtl coding on fpgas. I've never even heard the term. I'm guessing its a non issue. Just register the output, ping pong averted? Isn't that the answer?

K-J · Jun 14, 2013

amitk3553 said:
we need to register all Outputs(Means output should come from registers) to prevent ping-pong effects!!!!!!!!!

That's a bold statement to claim benefit from something you know nothing about. No matter how many exclamation points you use, it will not change reality.
- Long combinatorial paths that slow down the maximum clock speed are not the result of 'ping pong'
- If you eliminated all 'ping pong' you would not guarantee any sort of performance improvements

If you doubt this, then simply insert the following into your code and see how the performance degrades "c <= a / b;".

Let me re-iterate, ping pong means absolutely nothing when it comes to RTL. It's only usage in design has to do with multiple memory buffers (or other resources) where you will fill/use one up and then switch over to the other. You would use that as a mechanism to use components that cannot handle the incoming peak data rate but can handle the average data rate.

So want to know exactly what is ping ponging in RTL which we have to prevent form ping ponging to achieve good synthesis results.

As I said, it means absolutely nothing. Design is all about function, performance and reliability. Anyone using the term 'ping pong' in the context of RTL doesn't know much about RTL or design in general that would be worth learning.

I think I cannot understand this concept by just playing game of ping pong!!!

There is no concept to learn so you would be better off spending your time elsewhere. Apparently you are unwilling to accept this.

Kevin Jennings

mrflibble · Jun 14, 2013

Register module outputs, problem solved. The end.

K-J · Jun 15, 2013

mrflibble said:
Register module outputs, problem solved. The end.

Simple designs where there is no flow control can typically be made to have outputs that are either clocked or not clocked as the designer may choose.

An interface protocol with flow control will not typically tolerate the one sided delaying of a signal. An example is Avalon or Wishbone memory mapped interface where a slave device receives a 'read' or 'write' command input. The slave device responds with a 'wait' output that must be combinatorial because any clock cycle where the command signal is active and the wait signal is not active signals that the command has been accepted. Delaying 'wait' by clocking it will result in a non-functional system.

Kevin Jennings

j_andr · Jun 15, 2013

K-J said:
An example is Avalon or Wishbone memory mapped interface where
a slave device receives a 'read' or 'write' command input.
The slave device responds with a 'wait' output that must be combinatorial because/.../
Kevin Jennings

an Avalon slave can keep wait high 'by default' and drop wait low
when read data ready / write data written;
so wait can be registered

--------------

btw - the only 'ping-pong' term I've heard in fpga field is 'ping-pong buffers'
j.a

K-J · Jun 15, 2013

j_andr said:
an Avalon slave can keep wait high 'by default' and drop wait low when read data ready / write data written; so wait can be registered

But only at the rather high cost of giving up half of the performance of the interface except for certain constrained conditions. Likely only acceptable in limited circumstances.

FvM · Jun 15, 2013

Minimize Ping-Pong Signals
Do not design, with other modules, signals that combinatorially bounce from once module, back to another, then back again.

I wonder if "ping-ping signal" is something like regional technical slang, brought up by someone in India (may be a professor) that hasn't yet become known in the digital design world?

Besides questionable grammar of the quoted "rule", the term seems rather arbitrary, as well as the selection of timing closure issues in the paper.

It's quite funny how the OP made a number of people think seriously about a non technical term describing a non issue...

j_andr · Jun 15, 2013

K-J said:
But only at the rather high cost of giving up half of the performance of the interface
except for certain constrained conditions. Likely only acceptable in limited circumstances.

Code:

always @(posedge av_clk)
  if      ( !(av_read || av_write) )  av_wait <= 1'b1;
  else if (write_done || read_done )  av_wait <= 1'b0;

I don't think an async. control of 'av_wait' will work faster then the example above;

j.a

permute · Jun 15, 2013

"ping-pong" is another terms for "double-buffered", at least everytime I've seen the term used. The idea is to have two buffers (rams) so that one is always available for writing whenever the other is being read from.

The linked article refers to a system layout where there is a signal that is formed by a connection between two module followed by a connection back to the first.

Consider this as a practical example (or counter-example) -- module A presents a fifo-like interface to module B. The "fifo empty" signal is (for this example) a combinatorial output of module A. Module B uses this signal to generate "fifo read" which is then sent back to module A. This is a path where there is combinatorial logic on A's output to B, and then there is additional combinatorial logic on B's output back to A.

With a combinatorial path that goes into a module and then back out, the layout of this interface strongly becomes part of a possible longest path. Had there been register stages on the outputs/inputs, it would be easier to design because the possible longest paths would be internal to each module, or on just one part of the path between them. Now the longest path includes the logic for "fifo empty", the path from A to B, the path from this input on B through the logic for "fifo read", the path from B back to A, and then the path from this input on A to any affected logic. The logic involved may involve several layers, cannot be optimized across the module boundaries (if they are to have a common layout), and now are dependent on multiple layout choices.

If you don't re-use the layout of A or B for each instance, and if you are able to do optimizations across the A-B boundary, this issue is reduced. eg, for FPGA designs where global optimization is done and where modules don't have a fixed layout, the rationale for this issue goes away (unless partial reconfigurable modules are used for the FPGA design).

K-J · Jun 16, 2013

j_andr said:
Code:

always @(posedge av_clk) if ( !(av_read || av_write) ) av_wait <= 1'b1; else if (write_done || read_done ) av_wait <= 1'b0;

I don't think an async. control of 'av_wait' will work faster then the example above;

j.a

The above code will always have a one clock cycle latency on the wait. Zero clock cycle latency is faster than one.

Plus there are built-in assumptions in your example about the addressing. Specifically, it assumes that each address being accessed has the same latency and that the ability of any address to accept a command is the same for each address. If there are back to back read/writes to different addresses in your component, then that component would have to s-ck up that subsequent access because it has already signaled that there is no wait request pending because you wouldn't be able to generate the wait signal for the second address until the next clock cycle.

Another way to handle it would be to always set wait active on the clock cycle after having dropped it. That way the access to the second address would always start with a wait cycle just like the first. The problem with this approach is that the throughput on this interface has just been cut in half which would be an example of what I stated earlier about performance taking a hit.

A good example of both zero clock cycle latency and variable latency between different addresses is a component that has two FIFOs that can be written to. The 'wait request' signal for each FIFO would be the FIFO Full signal. The wait request for the component then would be

wait <= Fifo_Full_1 when (Address = 0) else Fifo_Full_2; -- Not clocked, just logic

I'm not disputing that there are design examples where wait could be generated from a register, I'm just saying that in general that would not be the case. If you tried to clock this then you would need to provide a 'holding' register for incoming data for each FIFO to cover the situation where one FIFO is full, the other is not full and the not full one gets written to first.

We're way off on a tangent though...my only original point is that one can't just say 'register all the outputs' as if to imply that by doing that there wouldn't be repercussions like having to buffer an entire data bus to multiple devices or to cut performance in half in order to compensate for the latency generated in a status output signal.

Kevin Jennings

- - - Updated - - -

permute said:
Consider this as a practical example (or counter-example) -- module A presents a fifo-like interface to module B. The "fifo empty" signal is (for this example) a combinatorial output of module A. Module B uses this signal to generate "fifo read" which is then sent back to module A. This is a path where there is combinatorial logic on A's output to B, and then there is additional combinatorial logic on B's output back to A.

With a combinatorial path that goes into a module and then back out, the layout of this interface strongly becomes part of a possible longest path. Had there been register stages on the outputs/inputs, it would be easier to design because the possible longest paths would be internal to each module, or on just one part of the path between them. Now the longest path includes the logic for "fifo empty", the path from A to B, the path from this input on B through the logic for "fifo read", the path from B back to A, and then the path from this input on A to any affected logic.

You haven't thought through this example fully. If you add registers on the interface, consider the following:
- The empty signal out of a FIFO typically already is registered so there would be absolutely no timing improvement
- If you registered fifo empty before it leaves component A, then component B will not be presenting valid status and could start a read when the FIFO first goes empty. You would then need to compensate by adding additional logic. That additional logic would use additional resources and likely gobble up any potential timing improvement that you think you are getting.
- Nothing about this example has really anything to do with signals in different modules. Nothing would change if all of what you described was in the same 'module'.

Kevin Jennings

permute · Jun 16, 2013

K-J said:
You haven't thought through this example fully. If you add registers on the interface, consider the following:
- The empty signal out of a FIFO typically already is registered so there would be absolutely no timing improvement
- If you registered fifo empty before it leaves component A, then component B will not be presenting valid status and could start a read when the FIFO first goes empty. You would then need to compensate by adding additional logic. That additional logic would use additional resources and likely gobble up any potential timing improvement that you think you are getting.
- Nothing about this example has really anything to do with signals in different modules. Nothing would change if all of what you described was in the same 'module'.

Kevin Jennings

This is a purely practical issue, not a logical one. My example is a realistic example of the problem as listed. You are exactly correct that a register cannot logically be added to either the empty or the read signal. That is specifically why I chose it -- each developer would intentionally note that fact within both modules. Furthermore, you are also exactly correct that within* a module the problem doesn't even exist, at least not beyond the fact that it might still be the longest path even though it is fully within one module.

Imagine the problem more as the ASIC or partial reconfigurable FPGA design. In these cases, the modules may have been synthesized and implemented independently. At a minimum, it makes it harder to write the constraints for each partially reconfigurable module. As the linked guidelines suggest, the proper course would be to avoid this type of interface between major modules that have been synthesized/implemented independently.

* "within" meaning the implementation is done for both A and B together, and not A then B then on the connections between them.

j_andr · Jun 17, 2013

the thread is about to be closed;
just few last notes;

K-J said:
The above code will always have a one clock cycle latency on the wait.
Zero clock cycle latency is faster than one.

sounds slightly beter then:

But only at the rather high cost of giving up half of the performance
of the interface except for certain constrained conditions.
Likely only acceptable in limited circumstances.

and much better then:

Delaying 'wait' by clocking it will result in a non-functional system.

I was not going to argue what solution is the best;
there is no 'best' solution in all cases;
I was just 'triggered' by the statement:
Delaying 'wait' by clocking it will result in a non-functional system.
which is simply not true;
how to solve a particular case depends almost always on the particular case;

j.a

Welcome to EDAboard.com

Ping pong effect in RTL compilation

Advanced Member level 4

Member level 3

Member level 1

Advanced Member level 2

Advanced Member level 4

Advanced Member level 7

Advanced Member level 2

Advanced Member level 5

Advanced Member level 2

Full Member level 4

Advanced Member level 2

Super Moderator

Full Member level 4

Advanced Member level 3

Advanced Member level 2

Advanced Member level 3

Full Member level 4

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor