# VHDL signed numbers arithmetics

Status
Not open for further replies.

#### shaiko

Hello,

signals a,b,c are defined as follows:

Code:
signal a : signed ( 7 downto 0 ) := "00000000" ;
signal b : signed ( 7 downto 0 ) := "00000001" ;
signal c : signed ( 7 downto 0 ) ;
in my code I write:
Code:
c <= a - b ;
Because I'm using signed numbers, I expect the result to be
minus 1. Which in signed binary is: 10000001
However the result is: 11111111

What did I do wrong?

##### Super Moderator
Staff member
The result is in 2s complement and not binary with sign bit.

So 10000000 = -2^7 =-128
10000001 = -2^7+1 =-128+1=-127
11111111 = -128+64+32+16+8+4+2+1=-1

Last edited:
shaiko

### shaiko

points: 2

#### TrickyDicky

Hello,

signals a,b,c are defined as follows:

Code:
signal a : signed ( 7 downto 0 ) := "00000000" ;
signal b : signed ( 7 downto 0 ) := "00000001" ;
signal c : signed ( 7 downto 0 ) ;
in my code I write:
Code:
c <= a - b ;
Because I'm using signed numbers, I expect the result to be
minus 1. Which in signed binary is: 10000001
However the result is: 11111111

What did I do wrong?
Nothing. 11111111 is -1 in 2s compliment.
The problem with signed binary is that you get 2 values for zero - 1000000 and 0000000 giving -127 to +127. With 2s compliment you get -128 to +127, one number for each state.

shaiko

### shaiko

points: 2

##### Super Moderator
Staff member
A useful behavior of 2s complement arithmetic is all addition and subtraction ends up being binary addition when using 2s complement.

So 2 - 6 would be 0010 + 1001+1 = 1100 = -2^3+2^2=-8+4=-4

#### shaiko

My FPGA is communicating with an ADC over a serial bus. The ADC supports negative voltage readings.

Figure 26 at page 14 suggests that the device uses the leftmost bit as the sign bit.
I'm required to calculate the algebraic sum of 32 consecutive readings.
As you explained - VHDL signed uses 2's complement instead of the signed bit method...

Any suggestions as to how approach the problem?

#### TrickyDicky

That looks like standard 2s compliment to me
-5V = 8000
+5V = 7FFF

shaiko

### shaiko

points: 2

##### Super Moderator
Staff member
My FPGA is communicating with an ADC over a serial bus. The ADC supports negative voltage readings.

Figure 26 at page 14 suggests that the device uses the leftmost bit as the sign bit.
I'm required to calculate the algebraic sum of 32 consecutive readings.
As you explained - VHDL signed uses 2's complement instead of the signed bit method...

Any suggestions as to how approach the problem?
Like Tricky said the output is 2s complement. All you need to do is binary addition of 32 consecutive output samples to compute the algebraic sum.

Oh, and don't forget to sign extend the samples, when you perform the accumulation.

shaiko

### shaiko

points: 2

#### shaiko

That looks like standard 2s compliment to me
-5V = 8000
+5V = 7FFF
Thanks,
For some reason - at first glance it looked to me like a signed bit representation...

So in this case, the result will simply be a sum of 32 readings?
What will happen in the event of an overflow ?

- - - Updated - - -

Can you please explain what you mean by:
Oh, and don't forget to sign extend the samples, when you perform the accumulation.
Please give an example of how to do the sign extension...

##### Super Moderator
Staff member
shaiko said:
What will happen in the event of an overflow ?
That's why I mentioned you need to sign extend the input samples and run the accumulator with more bits.
The accumulator should have 21-bits i.e. 32 16-bit values added max negative value would be: 0x100000 (0x8000 << 5) and max positive would be: 0x0FFFE0(0x7FFF << 5).

shaiko said:
Please give an example of how to do the sign extension...
You should do the following:

Code:
accum <= accum + (samp[15] & samp[15] & samp[15] & samp[15] & samp[15] & samp[15 downto 0]);
Sorry about the ugly code, but as I don't use VHDL that often I don't recall a less verbose way of doing this.

Last edited:
shaiko

### shaiko

points: 2

#### shaiko

I came across this post:
http://sandbox.mc.edu/~bennet/cs110/tc/tctod.html
In the first paragraph it says:
check if the number is negative or positive by looking at the sign bit
This suggests that if the leftmost bit is '1' then the number is negative...

Now back to our example:
As you said the accumulator in the design must be extended to 21 bits.

Lets assume that one of the readings was the maximum possible negative (0x8000) and 31 of the remaining readings where zero volts (0x0000).
After summing up the readings our accumulator will look like this:
0 0000 1000 0000 0000 0000
We know that we are expecting a negative value yet looking at the leftmost bit which is '0' suggests we got a positive value...

Anything I missed?

#### TrickyDicky

you forgot to sign extend. The resize function does sign extension:

my_12bit_output <= resize(some_8bit_0, 12) + resize(some_8bit_1, 12);

shaiko

### shaiko

points: 2

#### shaiko

you forgot to sign extend. The resize function does sign extension:

my_12bit_output <= resize(some_8bit_0, 12) + resize(some_8bit_1, 12);
This does exactly the same thing ads-ee suggested in post #9?

##### Super Moderator
Staff member
That's the function that I couldn't remember. So you could resize to 21 bits.

Therefore you would end up with 0b1_1111_1000_0000_0000_0000 (0x1F8000) after adding up the 0x8000 and 31 0x0000s.

The "check if the number is negative by looking at the sign bit" is because the MSB is weighted as a negative value:

Code:
100:
(-1)*2^2 + (0)*2^1 + (0)*2^0 = -4 + 0 + 0 = -4
now sign extend
1100:
(-1)*2^3 + (1)*2^2 + (0)*2^1 + (0)*2^0 = -8 + 4 + 0 + 0= -4
sign extending the value doesn't affect the result

shaiko

### shaiko

points: 2

#### K-J

What will happen in the event of an overflow ?
An overflow implies you have a design error that should not exist. You know the range of the input and the number of inputs, the range of that summation is a computable constant.

Please give an example of how to do the sign extension...
Personally, I would convert the sample to an integer and add the sample to the accumulated sum like this
Code:
signal Accum:  integer -32*32768 to +32*32767;
...
Accum <= Accum + to_integer(signed(Sample_Input));
If the final accumulated sum needs to be converted back into a std_logic_vector on the way out, then simply add the following...
Code:
Accum_slv <= std_logic_vector(to_signed(Accum, Accum_slv'length));
Don't bother re-inventing the wheel about how to do arithmetic.

Kevin Jennings

shaiko

### shaiko

points: 2

#### shaiko

Thanks a lot for your help!

P.S:
More of a Boolean Algebra question -
What is the mathematical link between the number of additions and the required bit width for the result?

for example:

The number 3 is binary "11".
If we add 3 to itself 32 times the result ("1100000") will be a vector that's 5 bits longer then the original number.

Staff member

#### shaiko

You pointed me to the correct direction...
But if n is defined as the number of additions (the number of times you sent the '+' sign) then the formula should be nbit = ceil(log2(n+1)).
Correct?

#### shaiko

Another question:
Code:
signal signed_a : signed ( 7 downto 0 ) := "00001111" ;
signal unsigned_a : unsigned ( 7 downto 0 ) := "00001111" ;

signal signed_b : signed ( 7 downto 0 ) := "10001111" ;
signal unsigned_b : unsigned ( 7 downto 0 ) := "10001111" ;

signal signed_c : signed ( 7 downto 0 ) ;
signal unsigned_c : unsigned ( 7 downto 0 ) ;

signed_c  <= signed_a + signed_b ;
unsigned_c  <= unsigned_a + unsigned_b ;
If understand correctly, although signed_c and unsigned_c will have different algebraic meanings they will look exactly the same and have exactly the same bits lit up.
If this correct - it means that the mathematics is the same for both types...so why even have 2 different types ? When is it useful to define a signal as type "signed" ?

#### FvM

##### Super Moderator
Staff member
addition is identical for unsigned and signed, but for a lot of other operations it isn't. Just try out.

shaiko

### shaiko

points: 2

##### Super Moderator
Staff member
Try '>' or '<' with both signed and unsigned you'll definitely see a different result between the two.

- - - Updated - - -

You pointed me to the correct direction...
But if n is defined as the number of additions (the number of times you sent the '+' sign) then the formula should be nbit = ceil(log2(n+1)).
Correct?
Yes only if you consider that the number of additions required is always one less than the number of values you are adding together. FvM was showing you that nbit growth based on the number of values you are adding together. e.g.:

add 2 numbers: 11 + 11 = 110 (1 extra bit)
log2(2) = 1

add 3 numbers: 11 + 11 + 11 = 1001 (2 extra bit)
log2(3) = 1.58 (round up)

add 4 numbers: 11 + 11 + 11 + 11 = 1100 (2 extra bits)
log2(4) = 2

add 5 numbers: 11 + 11 + 11 + 11 + 11 = 1111 (2 extra bits)
log2 = 2.32 (round down)

add 6 numbers: 11+11+11+11+11+11 = 10010 (3 extra bits)
log2 = 2.58 (round up)

...

add 9 numbers: 11+11+...+11+11 = 11011 (3 extra bits)
log2 = 3.17 (round down)

so ceil gives you an overly conservative value in some instances, because it's really supposed to be a rounding function.

Regards

shaiko

points: 2