Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Sum of 100 8-bit signed numbers, FPGA Virtex 5

Status
Not open for further replies.

count_enable

Newbie level 3
Newbie level 3
Joined
Apr 18, 2012
Messages
3
Helped
0
Reputation
0
Reaction score
0
Trophy points
1,281
Visit site
Activity points
1,312
To make long story short:
- I have large number (>100) of 8-bit signed integers, produced in parallel as BRAM output.
- I need to find saturated sum of this integers, which leads me to a huge 100-input adder.

To simplify let's assume that number of inputs is always a power of 2.

My first idea is to build a binary tree of 2-input adders, i.e. represent the sum of A+B+C+D as a (A+B)+(C+D) . Maybe somebody already has written parametrized code for it?
I am stuck with the next (VHDL):
-I need an array of intermediate signals between the layers of adders, but the array is not rectangular but triangular one: for 128 input I need 64 adders on 1st level, 32 on the second and so on. So if I create the 2D signal array of size (NUM_LAYERS)*(NUM_INPUTS) more than 70% of the signals will be unused. But I don't know the number of layers (it should be parametrizable) to create individual vectors for intermediate signals of appropriate length.
Also I am afraid that such tree will be very slow.

Second idea is to use fast DSP48E blocks as 4-way adders (I have 64 in V5), but I have little experience working with them as adders.

Any ideas how to efficiently solve this?
 

You can use adder ( core gen ) and use generate statements if the no. of input parameters does not change on runtime.
 

Well, the question is how to generate intermediate signals inside the generation loop in such way that number of signals between the layers is equal to the number of adders. Also the Coregen component is an 2-input adder with argument width up to 256 bits, but I need 100 inputs of 8 bits...
 

The simplest form would just be to make the adder tree array square. Any unused/unconnected value is just removed at synthesis.
Its quite easy to work out the number of adders you need at each stage, and quite easy to work out how many stages you need.
The number of stages = log2(N). Number of adders at stage M = N/2^level.

This doesnt really help, as your tree array still has to be square because you cannot have an array of different sized arrays for synthesis (you can in simulation, but it requires pointers which are simulation only).

But Altera does provide a parrallel add megafunction. Xilinx probably provide something similar.

A tree will not be slow if you put a register on each adder. It will actually be a very fast solution. Yes you get 1 clock of latency for each level, but 8 clock pipeline at 250 MHz computes a value sooner than a single clock pipeline running at 20MHz.

- - - Updated - - -

This should generate the entire adder tree, based on an N parameter (you need to define a log2 function, but thats fairly straight forward. It also does no saturation, but again, simple enough).


Code VHDL - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
type adder_op_t is array(natural range <>) of unsigned(W+N downto 0);
type layer_t is array(natural range <> ) of adder_op_t;
 
signal adder_tree : layer_t( log2(N)-1 downto 0);
 
....
 
ip_con_gen : for i in 0 to N-1 generate
    adder_tree(0)(i)(W+N downto W) <= (others => '0');
    adder_tree(0)(i)(W-1 downto 0) <= ip(i);
end ip_con_gen;
 
adder_proc : process(clk)
begin
    if rising_edge(clk) then
        for l in 1 to adder_tree'high loop
            
            for i in 0 to N/(2**L) -1 loop
                adder_tree(l)(i) <= adder_tree(l-1)(2*i) + adder_tree(l-1)(2*i + 1);
            end loop;
        end loop;
    end if;    
end process;
 
output <= adder_tree(adder_tree'high)(0)(output'range);

 

Status
Not open for further replies.

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top