Simple array addition in verilog

samviva72 · Apr 19, 2011

This is my first post and I am about to ask a very very basic question. I have never touched Verilog before nor any other HDLs, but I know C, C++ etc. Can somebody please give me the equivalent of the following C code in Verilog (if such thing exists)? I want to be able to define and populate the array within a verilog module itself, and it is just for simulation purposes.

I would be very grateful if somebody can give me the code, then I will use this as a basis to start learning more about verilog.

Code:

void main(void) 
{

int array_1[] = {1, 2, 3, 4};
int array_2[] = {5, 5, 5, 5};

int i;
int c_sum = 0;

for(i=0; i<4; i++)
  {
   c_sum = c_sum + (array_1[i]*array_2[i]);
  }

}

}

blooz · Apr 20, 2011

Code:

			   `timescale 1ns / 1ps
//////////////////////////////////////////////////////////////////////////////////
// Company: 
// Engineer: 
// 
// Create Date:    10:31:23 04/20/2011 
// Design Name: 
// Module Name:    arrayz 
// Project Name: 
// Target Devices: 
// Tool versions: 
// Description: 
//
// Dependencies: 
//
// Revision: 
// Revision 0.01 - File Created
// Additional Comments: 
//
//////////////////////////////////////////////////////////////////////////////////
			module array(clk,sum,reset);
				input clk,reset;
				input [7:0] din;
				output reg [7:0] sum;
				reg [7:0] memu1[3:0]; 
				reg [7:0] memu2[3:0];
				integer i;
				always@(posedge clk)
				begin 
					if(reset==1'b1)
						begin
						memu1[0]<=1;
						memu1[1]<=2;
						memu1[2]<=3;
						memu1[3]<=4;
						memu2[0]<=5;
						memu2[1]<=5;
						memu2[2]<=5;
						memu2[3]<=5;
						sum=0;
					    end 
					else 
						begin  
							sum=0;
										for(i=0;i<4;i=i+1)
											if(i<4)
													begin 
														sum=sum+(memu1[i]*memu2[i]);
													end 
						
		            end 
				  end 
	   endmodule

Simple Example
If Reset is made high the initial values will be loaded in to the array
and Sum calculated

There are many others ways to do this ...it's only a simple example

verilog ....Samir Palnitkar is a good text to start ...

FvM · Apr 20, 2011

Although it's no problem to use it for simulation, I won't suggest an iteration loop (similar to the C code) for the design, because it would prevent synthesis of reasonable hardware in most cases. The most difficult thing when learning HDL with a software programmer background is to understand that iteration loops don't describe sequential actions in time and must be avoided in many situations.

samviva72 · Apr 20, 2011

Thanks for the sample code blooz. I have just got hold of a copy of Palnitkar and I am starting my learning process now. Two more quick queries for you:
- What verilog compiler/simulator do you suggest to use? The Palnitkar book comes with one on the accompanying CD. Is that good?
- In that code sample you gave, the 'if(i<4)' is not required right?

FvM - Yeah, I read similar comments about loops in hardware in other posts. So what is the way to achieve the same process? Do I have to declare the array defined outside the module, then send two elements at a time to the module which does multiplication only, then send result back? I don't know how to do that. Again any advice or sample code will be really appreciated.

blooz · Apr 20, 2011

FvM said:
Although it's no problem to use it for simulation, I won't suggest an iteration loop (similar to the C code) for the design, because it would prevent synthesis of reasonable hardware in most cases. The most difficult thing when learning HDL with a software programmer background is to understand that iteration loops don't describe sequential actions in time and must be avoided in many situations.

As FvM pointed out better hardware would be Synthesized if code is without iteration ..

The Above code in C style Could be modified to make it suitable for synthesis in a hardware style .
in single step you can write .
sum=(((memu1[0]*memu2[0])+ (memu1[1]*memu2[1]))+((memu1[2]*memu2[2])+(memu1[3]*memu2[3])));

because the parallel nature of the hardware must be taken into account .

---------- Post added at 14:56 ---------- Previous post was at 14:31 ----------

samviva72 said:
Thanks for the sample code blooz. I have just got hold of a copy of Palnitkar and I am starting my learning process now. Two more quick queries for you:
- What verilog compiler/simulator do you suggest to use? The Palnitkar book comes with one on the accompanying CD. Is that good?
- In that code sample you gave, the 'if(i<4)' is not required right?

FvM - Yeah, I read similar comments about loops in hardware in other posts. So what is the way to achieve the same process? Do I have to declare the array defined outside the module, then send two elements at a time to the module which does multiplication only, then send result back? I don't know how to do that. Again any advice or sample code will be really appreciated.

Yes you are Right Samviva,You can definitely go with out that IF.... ...BTW ...there is no need of that din.

About Simulator

I Would Chose Aldec Active HDl Student version
because of it's simple GUI ...suited for a beginner

Many Tutorials are provided by Aldec

here is the links
Aldec Active HDL student Version

h**p://www.aldec.com/Products/Product.aspx?productid=87b3ddbe-fc61-4984-a806-481772cdf23a

various tutorials on Active HDL

h**p://www.aldec.com/products/active-hdl/multimediademo/

Ofcourse
Modelsim Student version
h**p://model.com/content/modelsim-pe-student-edition-hdl-simulation

samviva72 · Apr 20, 2011

blooz said:
As FvM pointed out better hardware would be Synthesized if code is without iteration ..

The Above code in C style Could be modified to make it suitable for synthesis in a hardware style .
in single step you can write .
sum=(((memu1[0]*memu2[0])+ (memu1[1]*memu2[1]))+((memu1[2]*memu2[2])+(memu1[3]*memu2[3])));

because the parallel nature of the hardware must be taken into account .

OK, I understand the parallel nature now. But is there a limit to that 'parallelism'?
- Suppose I had two arrays of 2000 values each. Will that calculation depend on the size of the FPGA elements?
- Suppose I want to do another calculation on the sum reg after the 'sum of products' operation has finished. If calculations happen in parallel, how do I know the correct value of sum goes to the next line to be multiplied by 50 in the following code? Do I have to start another begin-end in between those two lines?

Code:

sum=(((memu1[0]*memu2[0])+ (memu1[1]*memu2[1]))+((memu1[2]*memu2[2])+(memu1[3]*memu2[3])));
another_val = sum*50;

Thanks for the verilog simulator links. You are a star!

blooz · Apr 20, 2011

samviva72 said:
OK, I understand the parallel nature now. But is there a limit to that 'parallelism'?
- Suppose I had two arrays of 2000 values each. Will that calculation depend on the size of the FPGA elements?
- Suppose I want to do another calculation on the sum reg after the 'sum of products' operation has finished. If calculations happen in parallel, how do I know the correct value of sum goes to the next line to be multiplied by 50 in the following code? Do I have to start another begin-end in between those two lines?

Code:

sum=(((memu1[0]*memu2[0])+ (memu1[1]*memu2[1]))+((memu1[2]*memu2[2])+(memu1[3]*memu2[3]))); another_val = sum*50;

1 .Answer to your First Question is- FPGA Resource is a factor that you have to consider.Larger the number of memory elements more resources are consumed .

2.If you code something like this that it ,

always@(posedge of clk)
begin
statement1;
statement2;

end

Statements between begin will executed sequentially
statement one then statement two.
so there is no need for another begin end .

permute · Apr 20, 2011

samviva72 said:
OK, I understand the parallel nature now. But is there a limit to that 'parallelism'?
- Suppose I had two arrays of 2000 values each. Will that calculation depend on the size of the FPGA elements?
- Suppose I want to do another calculation on the sum reg after the 'sum of products' operation has finished. If calculations happen in parallel, how do I know the correct value of sum goes to the next line to be multiplied by 50 in the following code? Do I have to start another begin-end in between those two lines?

Code:

sum=(((memu1[0]*memu2[0])+ (memu1[1]*memu2[1]))+((memu1[2]*memu2[2])+(memu1[3]*memu2[3]))); another_val = sum*50;

Thanks for the verilog simulator links. You are a star!

The majority of verilog is describing synchronous logic. the 2000 element inner product is supposedly used to solve some problem. If you assume that design must process data as fast as possible (in a low latency sense), then you would use 2000 multiplications and a very large adder tree. But that's a lot of resources. You might also decide that the hardware can use a longer period of time to do the calculation, doing 100 multiplications per cycle and completing in 20 cycles. Or perhaps the calculation can take a very long time to complete. In this case you might design a small circuit that performs 1/4th of a multiplication per cycle and takes 8000 cycles to process the data. To do the latter cases, you would also have to describe some logic that buffered up some of the 2000 data inputs, so that they could be presented on later cycles.

By defining the problem, it allows you to determine how to best describe the solution. The best designs will run as fast as needed, but without requiring more resources than needed.

On the second issue, The syntax determines this. You have used blocking assignments which are common in always blocks that describe combinatorial logic. You really should become familiar with non-blocking operations, and how verilog simulates things. for the combinatorial always block, the order doesn't matter _in this case_. This is because another_val is updated after sum. But if the lines were swapped (and sum was added to the sensitivity list), then sum would be updated which would re-trigger an evaluation of the same always block. another_val would then be updated. sum would also be evaluated, but would evaluate to the same value (otherwise you would have an invalid design using a "combinatorial loop").

samviva72 · Apr 21, 2011

blooz said:
About Simulator

I Would Chose Aldec Active HDl Student version
because of it's simple GUI ...suited for a beginner

OK, my uni has Altera licence and so I am using Quartus to compile and simulate for now. I had my first try at simulating after spending hours on the tutorial Anyway I managed to simulate the code written by blooz and the results are shown in the diagram below. As you can see, the result is 50 at the beginning even before the reset is made high. I'm not too sure why this is the case? How did the memu1 and memu2 got its values without the condition 'if(reset==1'b1)' satisfied yet?

I'm probably missing something here. Could you please tell me what changes I need to bring to make the calculation happen only after the reset is pressed, i.e. stay 0 at beginning, then when reset goes high it goes to 50, then at another reset high sum is initialized to 0 again and results 50 after calculation?

blooz · Apr 22, 2011

I'm probably missing something here. Could you please tell me what changes I need to bring to make the calculation happen only after the reset is pressed, i.e. stay 0 at beginning, then when reset goes high it goes to 50, then at another reset high sum is initialized to 0 again and results 50 after calculation?
View attachment 55534[/QUOTE]

Code:

module array(clk,sum,reset);
input clk,reset;
output reg [7:0] sum;
reg [7:0] memu1[3:0];
reg [7:0] memu2[3:0];
always@(clk,reset) //providing an //Asynchronous reset
begin
if(reset==1'b1)
begin
memu1[0]=1;
memu1[1]=2;
memu1[2]=3;
memu1[3]=4;
memu2[0]=5;
memu2[1]=5;
memu2[2]=5;
memu2[3]=5;
sum=0;
end
else if (reset==1'b0&&)
begin
sum=(((memu1[0]*memu2[0])+
(memu1[1]*memu2[1]))+
((memu1[2]*memu2[2])+
(memu1[3]*memu2[3])));

end
end
endmodule

otherwise use a calc_enable ....so that

Code:

module array_new(clk,sum,reset,calcen);
				input clk,reset,calcen;
				output reg [7:0] sum=7'b0;
				reg [7:0] memu1[3:0]; 
				reg [7:0] memu2[3:0]; 
				always@(clk,reset) //providing an Asynchronous reset 
				begin 
					if(reset==1'b1)
						begin
						memu1[0]=1;
						memu1[1]=2;
						memu1[2]=3;
						memu1[3]=4;
						memu2[0]=5;
						memu2[1]=5;
						memu2[2]=5;
						memu2[3]=5;
						sum=0;
					    end   
					else if (calcen==1'b0)
						sum=7'b0;
					else if (reset==1'b0&calcen==1'b1)
						 begin 
				 sum=(((memu1[0]*memu2[0])+ 
				 (memu1[1]*memu2[1]))+
				 ((memu1[2]*memu2[2])+
				 (memu1[3]*memu2[3])));
						
		                  end 
				  end 
	            endmodule

samviva72 · Apr 27, 2011

Thanks blooz for the code. OK, now I am trying to go a step up by applying a Sobel filter to an image (e.g. 320 x 240 array of 8 bit integer values for now). I have found some code that does the calculation.

Code:

module sobel_mine( p0, p1, p2, p3, p5, p6, p7, p8, out);

input  [7:0] p0,p1,p2,p3,p5,p6,p7,p8;	// 8 input pixels of 8-bits 
output [7:0] out;	// 1 ouput pixel of 8-bits 

// Internal wires

//11 bits because max value of gx and gy is 255*4 and last bit for sign					 
wire signed [10:0] gx,gy;
//Find the absolute value of gx and gy     
wire signed [10:0] abs_gx,abs_gy;
//Max value is 255*8. here no sign bit needed. 	
wire [10:0] sum;			
//------------------------//

//sobel mask for gradient in horizontal direction 
assign gx=((p2-p0)+((p5-p3)<<1)+(p8-p6));
//sobel mask for gradient in vertical direction 
assign gy=((p0-p6)+((p1-p7)<<1)+(p2-p8));

// Absolute value of gx 
assign abs_gx = (gx[10]? ~gx+1 : gx);
// Absolute value of gy 	
assign abs_gy = (gy[10]? ~gy+1 : gy);	

// Sum 
assign sum = (abs_gx+abs_gy);		

// Max value 255  	
assign out = (|sum[10:8])?8'hff : sum[7:0];	

endmodule

But I am not too sure of the best way to send and receive data to that module. I am using Altera board. Using its NIOS soft processor, I've been able to read all the pixel values and store them in a 2-d array with C code. I then used two for-loops to go through the whole array and send 8 values at a time to the sobel_mine verilog module. While this method worked fine, I don't see a lot of improvement if I had done the whole thing in C-code. I want to know how to bypass one of these for-loops.

To put it simple, let's imagine I have just 3 arrays of 320 elements at first. If I use a for-loop, I will access the verilog module 318 times. Is there a better way to do this? Do I have to instantiate more sobel_mine.v? I am lost a bit

Thank you in advance

blooz · Apr 29, 2011

Yes definitely ,You can exploit the parallel nature of the hardware. suppose your array is 256 by 256 and the processing is applied to and 8 by 8 subset ...so there are 1024 {S0,S1,....S1023} ....independent subsets.that could be processed separately ...suppose you have 2 processing elements ..P0 and P1 ..and they access the array in parallel based on a simple rule P0 access {s0,s2,...} even subsets and P1 access {S1,S3} ..odd subset ..

If there are enough Logic elements to do the trick ...then instantiating one more copy is a good idea ..

samviva72 · Apr 30, 2011

blooz said:
Yes definitely ,You can exploit the parallel nature of the hardware. suppose your array is 256 by 256 and the processing is applied to and 8 by 8 subset ...so there are 1024 {S0,S1,....S1023} ....independent subsets.that could be processed separately ...suppose you have 2 processing elements ..P0 and P1 ..and they access the array in parallel based on a simple rule P0 access {s0,s2,...} even subsets and P1 access {S1,S3} ..odd subset ..

If there are enough Logic elements to do the trick ...then instantiating one more copy is a good idea ..

Thanks again for the tips. I get the general picture of that what you said but I don't know how to implement this in verilog

How do I implement those two processing elements P0 and P1? Do I have to create a main.v and call two instantiation of sobel_mine.v? I am not sure how to do that.

What do I declare as input/output ports in that main.v? Because now each port coming from the soft processor will contain 2x8-bit integers, i.e. one for each subset. Will my new input be defined as input [15:0] pixel0_two_subsets, and then I do something like below to send data to each processing element:
wire [7:0] pixel0_P1 = [7:0]pixel0_two_subsets;
wire [7:0] pixel0_P0 = [15:8]pixel0_two_subsets;

I am probably inventing verilog syntax above OK, let's do it another level simpler and correct me if I am wrong below.

Suppose my initial array is 6x6 and so I am expecting 16 output values. In my current implementation of one sobel filter, I access it 16 times. In your suggestion of two processing elements, I will access each one 8 times only, and I will get the results faster because these two modules run in parallel. I could also instantiate 4 processing elements in the future and this will require only 4 accesses hence making it even faster (if I have enough logic elements to do this in parallel).

Could you please write a sample code assuming that the array is 6x6 and we will use two processing elements? I want to see how you instantiate them in the main.v file, and how you declare the input/ouput ports and other internal variables of that main.v before you call the sobel modules?

Thank you so much blooz! You are an invaluable help in getting me familiarized with the hardware/verilog world.

Simple array addition in verilog

samviva72

Newbie level 5

blooz

Advanced Member level 2

samviva72

FvM

Super Moderator

blooz

samviva72

Newbie level 5

blooz

Advanced Member level 2

samviva72

syeda amna

samviva72

Newbie level 5

blooz

Advanced Member level 2

permute

Advanced Member level 3

samviva72

samviva72

Newbie level 5

blooz

Advanced Member level 2

samviva72

Newbie level 5

blooz

Advanced Member level 2

samviva72

samviva72

Newbie level 5

Similar threads

Simple array addition in verilog

Newbie level 5

Advanced Member level 2

Super Moderator

Newbie level 5

Advanced Member level 2

Newbie level 5

Advanced Member level 2

Advanced Member level 3

Newbie level 5

Advanced Member level 2

Newbie level 5

Advanced Member level 2

Newbie level 5

Similar threads

Privacy & Transparency

Privacy & Transparency