# Understanding Skid Buffer Mechanism

#### promach

I have some questions about http://fpgacpu.ca/fpga/Pipeline_Skid_Buffer.html

1) Why is skid buffer designed to be 2-entries FIFO instead of just 1-entry FIFO ?

However, pipelining handshaking is more complicated: simply adding a pipeline register to the valid, ready, and data lines will work, but now each transfer take two cycles to start, and two cycles to stop.
2) Quoted from the article , why two cycles to start, and two cycles to stop ?

#### std_match

For timing/speed, you don't want a combinatorial path from destination/output "ready" going back to the source/input "ready".
This means that the "ready" going back to the source/input will be delayed one clock cycle compared to the "ready" coming from the destination/output.

So, when the destination says "not ready", the skid buffer must accept one more write from the source.

It is possible to do this with only one entry in the "FIFO", but you must then be pessimistic and always assume that the destination will be "not ready" in the next clock cycle.
It will work, but the throughput will be cut in half if the destination normally can handle back-to-back write cycles.

promach

### promach

points: 2

#### promach

It will work, but the throughput will be cut in half if the destination normally can handle back-to-back write cycles.
Why is throughput exactly cut in half ?

#### vGoodtimes

the module sees arvalid and responds with arready. on the next cycle it now must set rvalid -- it can't wait for rready. It also must set arready to something. If, on this next cycle rready is 0 arready must be 0. but this isn't an expression for a registered output. it requires a combinatorial output.

Last edited:

#### promach

Huh ? that still does not explain why throughput is exactly cut in half ?

Note the cycle difference between o_axi_rvalid and o_axi_rready, thus the number of o_axi_rdata (d2 and d3) being dropped

#### vGoodtimes

that looks wrong. why does rvalid go low when rready is low. why does rdata change when rready is low. rvalid later transitions low without rready.

it's highly suspect that these valids are changing after ready changes.

#### promach

look at the following READ transaction dependency flow graph and you will see that o_axi_rready acts as backpressure mechanism to o_axi_arvalid

// Note the cycle difference between o_axi_rvalid and o_axi_rready

// thus the number of o_axi_rdata (d2 and d3) being dropped

#### TrickyDicky

rready acts as backpressure on the rdata channel. arready acts as backpressure on the araddr channel. You timing diagram shows violation of the AXI spec as RDATA and RVALID should remain stable when RREADY is low. ARVALID is independent of RREADY externally (if you wire it interally, thats your call, but from an AXI perspective on ARREADY acknowledges addresses).

#### promach

timing diagram shows violation of the AXI spec as RDATA and RVALID should remain stable when RREADY is low.
I am not 100 percent convinced by this sentence above.
Could you elaborate more on this ?

#### TrickyDicky

I am not 100 percent convinced by this sentence above.
Could you elaborate more on this ?
Please refer to the AXI4 spec - specifically A3.2.2

AXI4 Spec said:
The master can assert the ARVALID signal only when it drives valid address and control information. When
asserted, ARVALID must remain asserted until the rising clock edge after the slave asserts the ARREADY signal.
The default state of ARREADY can be either HIGH or LOW. This specification recommends a default state of
HIGH. If ARREADY is HIGH then the slave must be able to accept any valid address that is presented to it.
Note
This specification does not recommend a default ARREADY value of LOW, because it forces the transfer to take
at least two cycles, one to assert ARVALID and another to assert ARREADY.

The slave can assert the RVALID signal only when it drives valid read data. When asserted, RVALID must remain
asserted until the rising clock edge after the master asserts RREADY. Even if a slave has only one source of read
data, it must assert the RVALID signal only in response to a request for data.
The master interface uses the RREADY signal to indicate that it accepts the data. The default state of RREADY
can be HIGH, but only if the master is able to accept read data immediately, whenever it starts a read transaction.
The slave must assert the RLAST signal when it is driving the final read transfer in the burst.
- - - Updated - - -

Even more detailed, please see A3.2.1, as it covers all channel VALID/READY pairs.

#### promach

RDATA and RVALID should remain stable when RREADY is low.
Is this specifically stated in the AXI spec ? I cannot find this requirement though.

#### TrickyDicky

In Figure A3-2, the source presents the address, data or control information after T1 and asserts the VALID signal.
The destination asserts the READY signal after T2, and the source must keep its information stable until the transfer
occurs at T3, when this assertion is recognized.

promach

### promach

points: 2

#### promach

For those missing the discussion, see the AXI transaction handshake process

Since DATA is kept stable until both VALID and READY are asserted, then skid buffer serves no purpose at all ?

Please correct me if I am wrong.

#### promach

@TrickyDicky : ok, I got your point

the slave must wait for both ARVALID and ARREADY to be asserted before it asserts RVALID to indicate that valid data is available

Code Verilog - [expand]1
else if(i_axi_arvalid && o_axi_arready) o_axi_rdata <= mem[i_axi_araddr];

and

Code Verilog - [expand]1
else o_axi_rvalid <= i_axi_arvalid && o_axi_arready;

Now back to this thread topic, I do not see how skid buffer fits inside AXI

- - - Updated - - -

As for the o_axi_rvalid (slave) and o_axi_rready (master) pair instead of o_axi_arvalid (master) and o_axi_arready (slave) pair , I think you mean the following waveform which states that o_axi_rdata (slave) must remained stable (d2 and d3) whenever o_axi_rvalid or o_axi_rready is pulled low

Last edited:

#### std_match

The timing diagram is wrong. On each interface, a transaction is completed when valid and ready are high in the same clock cycle. In your diagram, this means that d2 is transferred 3 times to the slave, and d3 two times.

1. The master must not wait for the ready signal. When the master wants to transmit, it must set valid and data regardless of the ready signal.
2. When valid and data are set, they must not change until the transaction is completed, which happens when valid and ready are set in the same clock cycle.

The skid buffer has no "functional" purpose. It is inserted to improve timing, like "normal" pipelining.
The problem is that there is one signal (ready) going in the other direction. To get the full timing improvement, that signal also needs a register. The master will then see the ready signal one clock cycle "too late" when it is set low by the slave.
If there is only one register in the skid buffer, the ready signal going back to the master can only be active for one clock cycle at a time, because the slave can set ready=0 but the master would see it too late.
This means that every second clock cycle the ready going back to the master must have ready=0 even if the slave has ready=1 all the time. The throughput will be cut in half.
The skid buffer with two registers solves that problem. It can accept one more write from the master when the slave sets ready=0.

### promach

points: 2

#### promach

On each interface, a transaction is completed when valid and ready are high in the same clock cycle. In your diagram, this means that d2 is transferred 3 times to the slave, and d3 two times.
You mean o_axi_rvalid && o_axi_rready ?

If yes, then d2 is only transferred ONCE

#### std_match

You mean o_axi_rvalid && o_axi_rready ?

If yes, then d2 is only transferred ONCE
A transaction is completed when valid and ready have been set in the same clock cycle (= both sampled high simultaneously at the positive clock edge at the end of the clock cycle).

With a skid buffer, there are two AXI interfaces. One from master to skid buffer and another from skid buffer to the slave.
In the diagram in post #14, you don't show the data for master to skid buffer, but valid is illegal since it goes from 1 to 0 when ready=0.
All the diagrams in the other thread have the same error.
Valid can only go from 1 to 0 directly after a clock cycle where ready has been 1.

For skid buffer to slave, there are 7 completed transactions in post #14: d0, d1, d2, d2, (then we have a gap with no transactions because valid=0), d2, d3, d3

promach

### promach

points: 2

##### Super Moderator
Staff member
@std_match

The problem is that this discussion seems to be taking place across 2 threads.

Impossible to merge at this point, the thread would become unreadable due to the interleaving of posts that are unrelated to each other.

- - - Updated - - -

One of the issues with promach's timing diagrams is the separation of slave and master signals and the mixing of the different channels. This makes their timing diagrams very confusion and requires careful study of the names of each signal.

Promach, I advise you draw your timing diagrams with only the three signals data, ready, valid for ONE channel grouped together, it doesn't mater if it is form a slave or master. Mixing and matching signals from different channels in your diagrams is confusing, i.e. see post #14 diagram where the rready is at the bottom and does not look like it is part of the rdata and rvalid channel signals at the top of the diagram. Hence post #15 misinterpreted the aready is part of the rdata and rvalid channel, since it is drawn in the same group of signals.

#### Renash

##### Newbie
Please correct me if my general understanding is wrong.

Skid Buffers improve the throughput of data being produced by the Master, but the entire throughput of the design (with-master-and-slave) will still be limited by slave, if slave is slowest.

Is this right?