[moved] subtracting vecotr form each row a of a matrix in each clock cycle in Verilog

Mahnaz_m · Apr 11, 2016

Hi,

I have a vector and a matrix of hexadecimal values stored in .mem files. I need to subtract this vector from each row of a matrix in each clock cycle.
I have read my files to my testbench file using $readmemh in verilog. How can I subtract them now?
I have used xilinx ip floating ip core for subtraction as well.
Here is my code but this code works for only one row of matrix.

Code Verilog - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
module rbfnn_tb;
  //Memory for reading files        
    reg [63:0] test_input[7:0];
    reg [63:0] centers [7:0];
    // Input ports all be registers
    reg aclk;
    reg [64*8-1:0] test_flat;
    reg [64*8-1:0] center_flat;
    reg aresetn;
    reg s_axis_a_tvalid;
    reg s_axis_b_tvalid;
    reg s_axis_operation_tvalid;
    reg m_axis_result_tready;
    reg [7:0] s_axis_operation_tdata;
    
    // output ports will be wires
    wire s_axis_a_tready;
    wire s_axis_b_tready;
    wire s_axis_operation_tready;
    wire m_axis_result_tvalid;  
    wire [2:0] m_axis_result_tuser;
    wire [64*8-1:0] sub_out;
    //reg [64*8-1:0] sub_result;
// reading input files  
initial begin
    $readmemh("test.mem", test_input);
    $readmemh("centers.mem", centers); 
end
// flattening Input vectors
integer i;
always @ (posedge aclk)
begin
    for (i = 0; i < 8; i = i+1)
    begin
        
        test_flat[64*i+: 64] <= test_input[i];
        center_flat[64*i+: 64] <= centers[i];
        end
end
 
 rbfnn RBFNN( 
            .test_input(test_flat),
            .centers(center_flat),
            .sub_out(sub_out),
            .aclk(aclk),
            .aresetn(aresetn),
            .s_axis_a_tvalid(s_axis_a_tvalid),
            .s_axis_b_tvalid(s_axis_b_tvalid),
            .s_axis_operation_tvalid(s_axis_operation_tvalid),
            .m_axis_result_tready(m_axis_result_tready),
            .s_axis_operation_tdata(s_axis_operation_tdata),
            .s_axis_a_tready(s_axis_a_tready),
            .s_axis_b_tready(s_axis_b_tready),
            .s_axis_operation_tready(s_axis_operation_tready),
           .m_axis_result_tvalid(m_axis_result_tvalid), 
            .m_axis_result_tuser(m_axis_result_tuser)
            //.sub_result(sub_result)
            );
initial begin aclk <= 0; end
    always #10 aclk <= ~aclk;
    
initial
begin
    $monitor($time, "  clk = %b \n test: %h \n center: %h \n result = %h ", aclk, test_flat, center_flat, sub_out );
    // Initialize Inputs
        aresetn <= 1;
        s_axis_a_tvalid <= 0;
        s_axis_b_tvalid <= 0;
        s_axis_operation_tvalid <= 0;
        m_axis_result_tready <= 0;
        test_flat <= 0;
        center_flat <= 0;
        s_axis_operation_tdata <= 0; 
 
         #8;     
        // Add stimulus here
        aresetn = 1;
        s_axis_a_tvalid = 1;
        s_axis_b_tvalid = 1;
        s_axis_operation_tvalid = 1;
        m_axis_result_tready = 1;
        s_axis_operation_tdata = 1; 
        #800 $finish;
    end
    
endmodule

ads-ee · Apr 11, 2016

As you haven't supplied the rbfnn module it's impossible to determine why the module doesn't calculate anything but the first row. The testbench doesn't do this as it supplies the rbfnn module with the matrix in a "flat" form.

So you have a 8 entry matrix with 64-bit values (in floating point)?

Why are you using floating point?

Mahnaz_m · Apr 12, 2016

what I have written calculates only the difference between test_input vector and one row of my Centers matrix. My centers matrix is of eight rows and each row with eight entries in each row. Values are all double precision for better accuracy.

Here is my RBFNN module:

Code Verilog - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
module rbfnn( 
            //Input POrts
            input aclk,
            input [64*8-1:0] test_input,
            input [64*8-1:0] centers,
            input aresetn,
            input s_axis_a_tvalid,
            input s_axis_b_tvalid,
            input s_axis_operation_tvalid,
            input m_axis_result_tready,
            input [7:0] s_axis_operation_tdata,
            //Output Ports
            output s_axis_a_tready,
            output s_axis_b_tready,
            output s_axis_operation_tready,
            output m_axis_result_tvalid,        
            output [2:0] m_axis_result_tuser,
            //output [63:0] sub_out
            output [64*8-1:0] sub_out
            //output [64*8-1:0] sub_result
    
            );
    // Instantiate module 8 times       
    genvar i;
    generate
        for (i=0; i<8; i=i+1)
        begin: loop
            fp_add_sub subtract (aclk, aresetn, s_axis_a_tvalid, s_axis_b_tvalid, s_axis_operation_tvalid, m_axis_result_tready, 
                                 s_axis_a_tready,s_axis_b_tready, s_axis_operation_tready,m_axis_result_tvalid, test_input[64*i+: 64],
                                        centers[64*i+: 64],s_axis_operation_tdata, sub_out[64*i+:64], m_axis_result_tuser);
            
        end
    endgenerate
endmodule

ads-ee · Apr 12, 2016

I'm not sure what experience you have designing hardware...it looks like all you've done in the past is software. Comments like

Values are all double precision for better accuracy

and

I have a vector and a matrix of hexadecimal values stored in .mem files.

seem to indicate a software view of the design.

centers matrix is of eight rows and each row with eight entries in each row

This would mean you have an 8x8 matrix of double precision values (sw hat on...64-bit IEEE-754 values). Or in other words sixty-four 64-bit IEEE-754 values.

You have a centers matrix? of

Code:

reg [63:0] centers [7:0];

This looks like only eight 64-bit values not a matrix of sixty-four 64-bit values. This looks like the first architectural error in the design.

The majority of Verilog coders I've met code memory arrays like this:

Code:

reg [63:0] centers [0:63];

using the opposite direction of the indices for the address. When displayed in a simulator the ordering when expanding the array shows [0] at the top and [63] at the bottom of the list. This also correctly makes the array a 8x8 matrix ([0:64]) of 64-bit ([63:0]) values.

I hope you understand that this code does not iterate for each clock cycle, e.g. clock 1 i=0, clock 2 i=1, etc. Instead it unrolls the loop assigning the eight test_input and center arrays to flat 512-bit packed words.

Code Verilog - [expand]
1
2
3
4
5
6
7
8
9
10
integer i;
always @ (posedge aclk)
begin
    for (i = 0; i < 8; i = i+1)
    begin
        
        test_flat[64*i+: 64] <= test_input[i];
        center_flat[64*i+: 64] <= centers[i];
        end
end

Assuming that you knew that...this only assigns the first row of centers every clock cycle. Not multiple rows, you need to index through multiple rows, which you can't do as there is only one row in your centers matrix to begin with.

I also don't get why you would code the testbench with the correct named association for the instantiated module and then use positional association for the instantiated submodule. Using positional associated port mapping is prone to errors and I would point this out in a code review.

The fact that you're bit banging the interface with the large initial block seems to indicate you don't understand how to write a good testbench that uses bus functional models.

-----
As an aside, I'm not sure why you want to use 64-bit IEEE-754 floating point math in an FPGA hardware design. This 8 copies of a IEEE-754 subtraction is going to use a lot of resources, let me repeat that, it's going to use a lot of resources. The majority of FPGA designs use either fixed point or integer math and can therefore take advantage of the hard IP DSP blocks in the majority of FPGA vendors offerings. Unless your data has a dynamic range that is so enormous that it can't fit in less than 1024-bits in fixed/integer there is no reason to be using floating point.

Welcome to EDAboard.com

[moved] subtracting vecotr form each row a of a matrix in each clock cycle in Verilog

Mahnaz_m

Newbie level 2

ads-ee

Super Moderator

Mahnaz_m

Newbie level 2

ads-ee

Super Moderator

Mahnaz_m

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics