# My moving-average filter resulted in wrong result

#### mohamis288

##### Full Member level 2 Hello,

I am simulating moving-average filter in modelsim. some confusing result have been observed and I explain them here.

My moving average filter is 7-tap and my verilog code is represented here:

Code:
timescale 1ms / 1ps

// Module Name: FIR_Filter

module movingAverage_Filter(clk, reset, my_data_in, my_data_out);

parameter N = 8;

input clk, reset;
input [N-1:0] my_data_in;
output reg [N-1:0] my_data_out;

// coefficients defination
// Moving Average Filter, 7th order

//wire [7:0] b0 = 0_0001010;
//wire [7:0] b1 = 0_0010000;
//wire [7:0] b2 = 0_0101000;
//wire [7:0] b3 = 0_1000000;
//wire [7:0] b4 = 0_0101000;
//wire [7:0] b5 = 0_0010000;
//wire [7:0] b6 = 0_0001010;

wire [7:0] b0 = 0.078125;
wire [7:0] b1 = 0.125;
wire [7:0] b2 = 0.3125;
wire [7:0] b3 = 0.5;
wire [7:0] b4 = 0.3125;
wire [7:0] b5 = 0.125;
wire [7:0] b6 = 0.078125;

wire [N-1:0] x1, x2, x3, x4, x5, x6;

// Create delays i.e x[n-1], x[n-2], .. x[n-N]
// Instantiate D Flip Flops
DFF DFF0(clk, reset, my_data_in, x1); // x[n-1]
DFF DFF1(clk, reset, x1, x2);      // x[x[n-2]]
DFF DFF2(clk, reset, x2, x3);      // x[n-3]
DFF DFF3(clk, reset, x3, x4);
DFF DFF4(clk, reset, x4, x5);
DFF DFF5(clk, reset, x5, x6);

//  Multitiplication
wire [N-1:0] Multi0, Multi1, Multi2, Multi3, Multi4, Multi5, Multi6;
assign Multi0 = my_data_in * b0;
assign Multi1 = x1 * b1;
assign Multi2 = x2 * b2;
assign Multi3 = x3 * b3;
assign Multi4 = x4 * b4;
assign Multi5 = x5 * b5;
assign Multi6 = x6 * b6;

assign Add_final_value = Multi0 + Multi1 + Multi2 + Multi3 + Multi4 + Multi5 + Multi6;

// Final calculation to output
always@(posedge clk)

endmodule

module DFF(clk, reset, my_data_in, data_delayed);
parameter N = 8;
input clk, reset;
input [N-1:0] my_data_in;
output reg [N-1:0] data_delayed;

always@(posedge clk)
begin
if (reset)
data_delayed <= 0;
else
data_delayed <= my_data_in;

end

endmodule

When I simulate this verilog code in modelsim, the result is completely wrong. in fact, we expect our output to be average of last seven input samples. but what we observed was just a 3-sample delayed version of input. in fact, in ideal simulation, modelsim ignore all the multiplier and some of delays. I also synthesized my code in code vision and I observed the synthesized circuit and understood that my observation were correct, i.e., multiplier were not present in my circuit and also input was connected directly to the output by first 3 DFF in between. input also was connected to the last 3 DFF (connected in series) which its output was no connect.

what is the problem do you think?
--- Updated ---

please give me a point. any help would be appreciated.

Last edited:

#### KlausST

##### Super Moderator
Staff member Hi,

for a "moving average" I expect all coefficients to be identical = "1/filter_depth".
You now have different coefficients, thus I call it a regular "FIR filter"

So far so good.
Now your input data width is 8 and so is the output data width.
Not to get an overflow the sum of the coefficients have to be <=1. This is not the case here.
You may need to adjust on this.

Instead of giving vague informations like "result is completely wrong" and "to be average of" ...
You could give values. With this we are able to check
* the simuation results

Klaus

• mohamis288

#### mohamis288

##### Full Member level 2 Hello,

I am simulating moving-average filter in modelsim. some confusing result have been observed and I explain them here.

My moving average filter is 7-tap and my verilog code is represented here:

Code:
timescale 1ms / 1ps

// Module Name: FIR_Filter

module movingAverage_Filter(clk, reset, my_data_in, my_data_out);

parameter N = 8;

input clk, reset;
input [N-1:0] my_data_in;
output reg [N-1:0] my_data_out;

// coefficients defination
// Moving Average Filter, 7th order

//wire [7:0] b0 = 0_0001010;
//wire [7:0] b1 = 0_0010000;
//wire [7:0] b2 = 0_0101000;
//wire [7:0] b3 = 0_1000000;
//wire [7:0] b4 = 0_0101000;
//wire [7:0] b5 = 0_0010000;
//wire [7:0] b6 = 0_0001010;

wire [7:0] b0 = 0.078125;
wire [7:0] b1 = 0.125;
wire [7:0] b2 = 0.3125;
wire [7:0] b3 = 0.5;
wire [7:0] b4 = 0.3125;
wire [7:0] b5 = 0.125;
wire [7:0] b6 = 0.078125;

wire [N-1:0] x1, x2, x3, x4, x5, x6;

// Create delays i.e x[n-1], x[n-2], .. x[n-N]
// Instantiate D Flip Flops
DFF DFF0(clk, reset, my_data_in, x1); // x[n-1]
DFF DFF1(clk, reset, x1, x2);      // x[x[n-2]]
DFF DFF2(clk, reset, x2, x3);      // x[n-3]
DFF DFF3(clk, reset, x3, x4);
DFF DFF4(clk, reset, x4, x5);
DFF DFF5(clk, reset, x5, x6);

//  Multitiplication
wire [N-1:0] Multi0, Multi1, Multi2, Multi3, Multi4, Multi5, Multi6;
assign Multi0 = my_data_in * b0;
assign Multi1 = x1 * b1;
assign Multi2 = x2 * b2;
assign Multi3 = x3 * b3;
assign Multi4 = x4 * b4;
assign Multi5 = x5 * b5;
assign Multi6 = x6 * b6;

assign Add_final_value = Multi0 + Multi1 + Multi2 + Multi3 + Multi4 + Multi5 + Multi6;

// Final calculation to output
always@(posedge clk)

endmodule

module DFF(clk, reset, my_data_in, data_delayed);
parameter N = 8;
input clk, reset;
input [N-1:0] my_data_in;
output reg [N-1:0] data_delayed;

always@(posedge clk)
begin
if (reset)
data_delayed <= 0;
else
data_delayed <= my_data_in;

end

endmodule

When I simulate this verilog code in modelsim, the result is completely wrong. in fact, we expect our output to be average of last seven input samples. but what we observed was just a 3-sample delayed version of input. in fact, in ideal simulation, modelsim ignore all the multiplier and some of delays. I also synthesized my code in code vision and I observed the synthesized circuit and understood that my observation were correct, i.e., multiplier were not present in my circuit and also input was connected directly to the output by first 3 DFF in between. input also was connected to the last 3 DFF (connected in series) which its output was no connect.

what is the problem do you think?
--- Updated ---

please give me a point. any help would be appreciated.
UPDATE
the above code have been synthetized in design vision and this is the schematic view: which is not correct for a FIR filter.

when I replaced the filter multiplier coefficient with the following lines:

wire [7:0] b0 = 0_0001010;
wire [7:0] b1 = 0_0010000;
wire [7:0] b2 = 0_0101000;
wire [7:0] b3 = 0_1000000;
wire [7:0] b4 = 0_0101000;
wire [7:0] b5 = 0_0010000;
wire [7:0] b6 = 0_0001010;

the schematic view is like this: which seems correct, but output is not as expected. it is like this: --- Updated ---

Hi,

for a "moving average" I expect all coefficients to be identical = "1/filter_depth".
You now have different coefficients, thus I call it a regular "FIR filter"

So far so good.
Now your input data width is 8 and so is the output data width.
Not to get an overflow the sum of the coefficients have to be <=1. This is not the case here.
You may need to adjust on this.

Instead of giving vague informations like "result is completely wrong" and "to be average of" ...
You could give values. With this we are able to check
* the simuation results

Klaus
Hello,
@KlausST
input signal was 8 bits. and sum of FIR filter's coefficients was more than one. but, input signal is between -64 and 63 and we have simulated this filter in MATLAB and we are sure that the output signal does not exceed the input 127 or -128. I have updatted the question, so you can see the related picture.
thank you

Last edited:

#### FvM

##### Super Moderator
Staff member Simulated in Matlab, how? With fixed package and correct bit widths? Otherwise the result is meaningless.

The original code has more than one problem
1. Real constants truncated to zero
2. 8x8 multiply 16 bit result truncated to 8 bit
3. Overflow in term sum, as already discussed

Now you fixed 1. by using integer constant, but not 2 and 3.

The simulator shows you each fault exactly, just need to look.
--- Updated ---

Regarding 2, to get a roughly correct result, you can strip 8 lower bits instead truncating the upper. Better keep full 16 bit resolution (plus up to 3 extra bits, see issue 3) til the final sum. Than truncate and round as necessary.

Last edited:

#### kaz1

##### Full Member level 4 Lots of issues as others pointed out.
1) you need clocked process on mults and adder, apart from final output.
This way you get real FIR and also can check sim better.
2) allow full bit growth internally:
8 bits * 8 bits => 16 bits
16 bits + 16 bits +... => 19 bits.
for final output discard LSBs as suitable.
match matlab model and your rtl if checking both.

#### mohamis288

##### Full Member level 2 Lots of issues as others pointed out.
1) you need clocked process on mults and adder, apart from final output.
This way you get real FIR and also can check sim better.
2) allow full bit growth internally:
8 bits * 8 bits => 16 bits
16 bits + 16 bits +... => 19 bits.
for final output discard LSBs as suitable.
match matlab model and your rtl if checking both.
Sorry I am new to Verilog. Can you please tell me how to discard LSBs in multiplication using a simple example?

#### kaz1

##### Full Member level 4 I assume it is like this for discarding 3 LSBs and some MSBs assuming no overflow and 8 bit output:

#### mohamis288

##### Full Member level 2 I
I assume it is like this for discarding 3 LSBs and some MSBs assuming no overflow and 8 bit output:
Thank you @kaz1,
It seems that add_final_value has just 8 bits. Because even 9th bit is out of bound. I also tried to do the same for previous statements like this:

assign Multi1 = (x1 * b1)[15:8];

But I got compilation error. What is your suggestion now?

Thank you

#### kaz1

##### Full Member level 4 Try this:

// Multitiplication
wire [15:0] Multi0, Multi1, Multi2, Multi3, Multi4, Multi5, Multi6;
assign Multi0 = my_data_in * b0;
assign Multi1 = x1 * b1;
assign Multi2 = x2 * b2;
assign Multi3 = x3 * b3;
assign Multi4 = x4 * b4;
assign Multi5 = x5 * b5;
assign Multi6 = x6 * b6;

assign Add_final_value = Multi0 + Multi1 + Multi2 + Multi3 + Multi4 + Multi5 + Multi6;

// Final calculation to output
always@(posedge clk)

Remember you need clocked process on mults and adder

#### FvM

##### Super Moderator
Staff member Remember you need clocked process on mults and adder
Not necessarily, but depending on clock frequency hardware synthesis might not achieve timing closure.

#### kaz1

##### Full Member level 4 Not necessarily, but depending on clock frequency hardware synthesis might not achieve timing closure.
There are 7 adders in parallel after mults (16*7 = 112 bits wide), unless it is kitchen based project then I expect your advice is not practical for a beginner.

#### FvM

##### Super Moderator
Staff member The adders are maximal 19 bits wide. Even with full 19 bit output, the unregistered design achieves > 100 MHz with a recent low cost FPGA. With registered multiplier output and a pipelined adder (4+3 partial sums registered) > 200 MHz.

So yes, speed can be considerably increased with additional registers, but it might not be necessary for the actual application.

My advice would be to use appropriate rather than maximal pipelining.

#### mohamis288

##### Full Member level 2 Try this:

// Multitiplication
wire [15:0] Multi0, Multi1, Multi2, Multi3, Multi4, Multi5, Multi6;
assign Multi0 = my_data_in * b0;
assign Multi1 = x1 * b1;
assign Multi2 = x2 * b2;
assign Multi3 = x3 * b3;
assign Multi4 = x4 * b4;
assign Multi5 = x5 * b5;
assign Multi6 = x6 * b6;

assign Add_final_value = Multi0 + Multi1 + Multi2 + Multi3 + Multi4 + Multi5 + Multi6;

// Final calculation to output
always@(posedge clk)

Remember you need clocked process on mults and adder
even with this code, I got the wrong output. I also changed [10:3] but output is still wrong. I thought to change my code to the shifter version like this:

Code:
wire [N-1:0] Multi0, Multi1, Multi2, Multi3, Multi4, Multi5, Multi6;
assign Multi0 = (my_data_in>>4)+(my_data_in>>6);
assign Multi1 = (my_data_in>>3);
assign Multi2 = (my_data_in>>4)+(my_data_in>>2);
assign Multi3 = (my_data_in>>1);
assign Multi4 = (my_data_in>>4)+(my_data_in>>2);
assign Multi5 = (my_data_in>>3);
assign Multi6 = (my_data_in>>4)+(my_data_in>>6);

assign Add_final_value = Multi0 + Multi1 + Multi2 + Multi3 + Multi4 + Multi5 + Multi6;

// Final calculation to output
always@(posedge clk)
my_data_out <= Add_final_value;

in this way I am sure that overflow does not occur. According to MATLAB, I am sure that the outout doesn't exceed 8-bit, So I let it to be that 8-bit. this time, my output in modelsim is: now it is better but something is wrong. I do not know what it is. let me know your thought.
this is my output in MATLAB: #### kaz1

##### Full Member level 4 I ran your coeffs in matlab: blue input, red output I wonder what is your input. Try not change original code but show me your work, it should work

#### mohamis288

##### Full Member level 2 I ran your coeffs in matlab: blue input, red output

View attachment 181715

I wonder what is your input. Try not change original code but show me your work, it should work
I thought it was not that much important.
here it is input: --- Updated ---

I ran your coeffs in matlab: blue input, red output

View attachment 181715

I wonder what is your input. Try not change original code but show me your work, it should work
in MATLAB every thing is ok, BUT I can not understand the problem with my verilog code.

#### kaz1

##### Full Member level 4 I suggest you keep simple. Try a clean sine wave without noise first. Also show me your original code.

#### FvM

##### Super Moderator
Staff member Problem is that the sum of coefficients is about 1.45, respectively you need to scale output down (shift right) to avoid overflow with full range input signal. In the post #13 simulation waveform it's not scaled down. Better scale the coefficients so that sum is power of 2.

#### mohamis288

##### Full Member level 2 Problem is that the sum of coefficients is about 1.45, respectively you need to scale output down (shift right) to avoid overflow with full range input signal. In the post #13 simulation waveform it's not scaled down. Better scale the coefficients so that sum is power of 2.
as shown in MATLAB code, output is in the allowed range (i.e. between -128 and 127). because I have limited the input to [-64,63] range. so the problem is not with the filter itself. the problem is just with verilog implementation.
--- Updated ---

I suggest you keep simple. Try a clean sine wave without noise first. Also show me your original code. @kaz1
using the method in post #13, my output to sinusoidal wave is like this. at the edge of negative and positive, something is wrong. you know what is it? my input is in 2's complement format. do I need to change the coefficient type, for example, unsigned and signed? I'll check it.

Last edited:

#### kaz1

##### Full Member level 4 as shown in MATLAB code, output is in the allowed range (i.e. between -128 and 127). because I have limited the input to [-64,63] range. so the problem is not with the filter itself. the problem is just with verilog implementation.
To match matlab with RTL you need to do following:

hs = round(h*128); %your coeffs scaled in RTL
y = filter(hs,1,x); %x range +63,-64
dout = floor(y/128); %divide back range + 66

you will see max y close to +8500
You then scale matlab filter output by dividing by 128 and you then get max y close +66

if you divide y by 64 you get double of range i.e. +130

division by 128 = discard 7 LSBs.
division by 64 = discard 6 LSBs.

as to rounding/overflow you can check

## Commands Quick-Menu: 