Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

RIFFA full duplex and multi-threading support

Status
Not open for further replies.

promach

Advanced Member level 4
Joined
Feb 22, 2016
Messages
1,199
Helped
2
Reputation
4
Reaction score
5
Trophy points
1,318
Activity points
11,636
With regards to previous discussion of RIFFA full duplex capability and this github issue , I have done some **broken link removed** which enables full duplex capability and multi-threading support.

However, my modification comes at a cost of increased resource usage due to **broken link removed**

Besides, my code modification could only afford to send a maximum of 65536 words using **broken link removed**. Beyond that data word count, my code will be in a deadlock internally due to rWrDataRen signal not being asserted to '1' when it is supposed to

And with further investigation by reducing the size of the fifo back to its original size, I found that in fifo_packer_128.v , the variable _rPackedCount is not reset to zero . Why ?

Besides, for rx_port_128.v , why do we need three FIFO instead of just two FIFO ?

Could anyone advise ?
 
Last edited:

the variable _rPackedCount is not reset to zero . Why ?

It is an attempt to force flush the buffer. I think the logic assumes flush will not occur if the count > 3 or if there is valid input on the same cycle. In that case, setting the value to 4 will cause the 1-3 bytes to be written out on the next cycle and the counter resets to 0 in the else path in that cycle.
 
Actually looking at this more, this the reset to four is a clever trick. if there are 0 words in the buffer, nothing happens. if 1-3 then the count is set to 4 and the data is sent on the next cycle. if 4 the data is sent out this cycle normally and the count resets to 0. if 5-7 then 4 words are sent out this cycle and the count is set to 4 to force the remaining 1-3 out the next cycle.

It looks like it does make assumptions about flush and valid data on the same cycle though. For example, if count = 3, flush = 1, valid = 4. the shift buffer will end up with 7 valid elements and count = 4. on the next cycle 4 will be shifted out, count will be 0, and 3 words of corrupting data remains.
 
I think the logic assumes flush will not occur if the count > 3 or if there is valid input on the same cycle.

it does make assumptions about flush and valid data on the same cycle though.

You have two conflicting sentences from two of your separate posts.

By the way, I am now debugging this problem using ILA and I am a bit curious how the assumption would apply to the waveform trace for which only 48 words out of 64 words are successfully sent back to my linux host computer under **broken link removed**.

Screenshot from 2018-07-06 12-39-50.png
 

You have two conflicting sentences from two of your separate posts.

The second post was an update after I looked at it more.

I suspect this has to do with the inputs/outputs to chnl_tester. Also, you should fix your github repository. why is it almost 600MB? why does each commit have 1k files?

I did notice that the rx/tx len are the same net. Perhaps rxlen was 48 when the transmission started, then became 64 later. Maybe some upstream fsm thinks the transmission is over while your local fsm thinks it is partially complete. Not sure, either way it seems like a thing that could go wrong.
 

Code Verilog - [expand]
1
2
3
4
5
// Shift data into and out of our buffer as we receive and write out data.
    if (rDataMaskedEn != 3'd0)
        _rPackedData = ((rPackedData>>(32*{rPackedCount[2], 2'd0})) | (rDataMasked<<(32*rPackedCount[1:0])));
    else
        _rPackedData = (rPackedData>>(32*{rPackedCount[2], 2'd0}));



What do you have in mind regarding _rPackedData in fifo_packer_128.v ?

You could also cross reference similar lines at _rPackedData in fifo_packer_64.v

Note: I will definitely clean up my github development branch, it contains all the build files as well as ILA waveforms.
 

for your purposes, I would just add the logic to detect the error condition -- a flush + a write that results in _count > 4 if count != 4 or _count !=4 if count = 4. or maybe just flush + valid input. It isn't clear if this is a problem or if this is your problem. It depends on how the interface is defined as well as if the rest of the design just never hits the possible error condition.

the 64 bit version would have similar issues. But it isn't clear if the invalid input -- getting 2 words of data when count = 1 and flush, or getting 1 word of data when count = 0 and flush -- is possible in the system/interface.
 
a flush + a write that results in _count > 4 if count != 4 or _count !=4 if count = 4. or maybe just flush + valid input.

I still do not understand how these three conditions give rise to error.

Could you elaborate further ?
 

For those of you who are interested in contributing to full-duplex RIFFA, then you are welcome to check out **broken link removed**

Just download those two files inside the above gist link and open riffa_full_duplex.gtkw using gtkwave software.
 

if current count is 3 and flush occurs when there are 2 valid input words and flush occurs then the data is put into the shifty buffer but the tracked count is set to 4 instead of 5. The result is a non-zero value in the shifty buffer that will be or'd with the next input.

if count is 4 and the input is 1 valid during the flus, then not all data will be flushed.

flush + valid is a super-set of these two error cases plus the non-error cases with valids+flush. The latter might be considered luck or otherwise interesting.
 
@vGoodtimes

I am planning to formally verify RIFFA component module later in the future.

Your idea should be very helpful by then.

I have solved the asynchronous FIFO depth issue.

Now, I am stucked at another FIFO's read enable signal not asserted high. This thing really takes some time to fix.

Do you have the general idea of what the following module actually does ?


Code Verilog - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
/* verilator lint_off UNOPTFLAT */
module reg_pipeline
    #(
      parameter C_DEPTH = 10,
      parameter C_WIDTH = 10
      )
    (
     input                CLK,
     input                RST_IN,
 
     input [C_WIDTH-1:0]  WR_DATA,
     input                WR_DATA_VALID,
     output               WR_DATA_READY,
 
     output [C_WIDTH-1:0] RD_DATA,
     output               RD_DATA_VALID,
     input                RD_DATA_READY
     );
 
    genvar                i;
 
    wire                  wReady [C_DEPTH:0];
    
    reg [C_WIDTH-1:0]     _rData [C_DEPTH:1], rData [C_DEPTH:0];
    reg                   _rValid [C_DEPTH:1], rValid [C_DEPTH:0];
 
    // Read interface
    assign wReady[C_DEPTH] = RD_DATA_READY;
    assign RD_DATA = rData[C_DEPTH];
    assign RD_DATA_VALID = rValid[C_DEPTH];
 
    // Write interface
    assign WR_DATA_READY = wReady[0];
    always @(*) begin
        rData[0] = WR_DATA;
        rValid[0] = WR_DATA_VALID;
    end
 
    generate
        for( i = 1 ; i <= C_DEPTH; i = i + 1 ) begin : gen_stages
            assign #1 wReady[i-1] =  ~rValid[i] | wReady[i];
 
            // Data Registers
            always @(*) begin
                _rData[i] = rData[i-1];
            end
 
            // Enable the data register when the corresponding stage is ready
            always @(posedge CLK) begin
                if(wReady[i-1]) begin
                    rData[i] <= #1 _rData[i];
                end
            end
 
            // Valid Registers
            always @(*) begin
                if(RST_IN) begin
                    _rValid[i] = 1'b0;
                end else begin
                    _rValid[i] = rValid[i-1] | (rValid[i] & ~wReady[i]);
                end
            end
 
            // Always enable the valid registers
            always @(posedge CLK) begin
                rValid[i] <= #1 _rValid[i];
            end
 
        end
    endgenerate
endmodule
/* verilator lint_on UNOPTFLAT */

 

That looks like a basic pipeline supporting back-pressure at only the destination. It should be kept reasonably shallow as wReady is a chain. It could be a possible critical path.
 
back-pressure at only the destination

I do not understand...

Back-pressure should work for source instead of destination, I suppose ?

- - - Updated - - -

This reg_pipeline module is the reason why RIFFA needs large FIFO before granting Tx the permission to transmit.

You are right about this module being the critical path between Tx preparation and Tx actual transmission.

Do we really need this module at all ?

- - - Updated - - -

This reg_pipeline module is used within registers.v

Note the variables 'CHNL_TX_LEN_READY' or 'wChnlTxLenReady' that control the Tx transmission permission.

XdqbiAt.png


- - - Updated - - -

After further investigation, this reg_pipeline module is only 2-registers pipeline. And the root cause comes from 'wChnlTxLenReady' which has to depend on 'wReqFieldDemux' or 'wRxrDataValid' or 'RXR_DATA_VALID'

This 'RXR_DATA_VALID' from line 298 of registers.v is actually signal from rx_engine_classic.v

This makes me suspect if this is the real factor in determining whether RIFFA could achieve full-duplex transaction or not. Please refer to page 52 of RIFFA documentation on Engine Layer. For full-duplex transaction, 'RXR_DATA_VALID' signal should not be involved at all in any Tx transaction activity.

Could anyone advise ?
 

Attachments

  • Screenshot from 2018-07-10 14-39-35.png
    Screenshot from 2018-07-10 14-39-35.png
    168.7 KB · Views: 56
Last edited:

From **broken link removed** I also found out that in channel_128.v , the signal wTxSgDataEmpty implies that rx_port_128 takes priority over tx_port_128. This also means that Rx and Tx are still not fully independent from each other even within the scatter-gather DMA layer.

Please remind me if I miss anything important.

bmYo6YW.png
 

Attachments

  • Screenshot from 2018-07-11 16-38-46.png
    Screenshot from 2018-07-11 16-38-46.png
    205.3 KB · Views: 117
Last edited:

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top