Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Single multiplier takes up a whole DSP block for

Status
Not open for further replies.

shaiko

Advanced Member level 5
Joined
Aug 20, 2011
Messages
2,644
Helped
303
Reputation
608
Reaction score
297
Trophy points
1,363
Activity points
18,302
Hello,

Already posted it on alteraforum.com but didn't get a lot of help - so I'll try again here.

I'm using a Cyclone V SOC FPGA.

Currently my design has 8 multipliers (which I coded in VHDL instead of instantiating).
The inputs to the multipliers are 12 and 16 bits wide.

According to this document:
https://www.altera.com/content/dam/...iterature/wp/wp-01159-arriav-cyclonev-dsp.pdf

I expected the tool to pack 2 multipliers into a single DSP block - so that for 8 multipliers only 4 DSP blocks shall be consumed.
Unfortunately - the compilation report shows that 8 DSP blocks are consumed (one per each multiplier).
I tried to change the synthesis behavior to area driven - but nothing changed.

Any idea what can cause such behavior ?
 
Last edited by a moderator:

I presume, Quartus won't necessarily pack the multipliers unless the DSP blocks are exhausted. Using individual DSP blocks might be advantageous routing-wise. Performing an addition at the multiplier output might be also a reason why individual DSP are used.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
For the same reason you get M9Ks/M20ks in empty designs when a LUTRAM would be fine - because its easier.
 

At first I also thought this is a "I can but I don't want" case.
But it's not...the packing fails to occur even if I replicate the design and exhaust all multipliers.

The device at question has 112 DSP block (224 multipliers).
If I go beyond 112 multipliers ( for example 200 ) - the tool uses up all the 112 DSP blocks and for the remaining 88 multipliers it resorts to LUT implementation (and fails during fitting because it exhausts all the combinatorial blocks).

P.S: just tried it. The problem persists even if I use multipliers from the IP catalog instead of using HDL.
 

I'd more likely wager this is a case of you have to instantiate the DSP IP with that intermediate multiplexer enabled. As I can't see how they would know you need to select that multiplexer and enable the pass through mode by inference. Perhaps there is some synthesis attribute that controls that.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
I'd more likely wager this is a case of you have to instantiate the DSP IP
As far as I researched - the Cyclone V FPGA family doesn't support explicit instantiation of DSP blocks (As you can do with a Xilinx DSP48 for example).
 

Code:
library ieee ;
	use ieee.std_logic_1164.all ;
	use ieee.numeric_std.all ;

entity test is
 	
port 	        
(	     
	IN_SOME_DATA : in std_logic_vector ( 15 downto 0 ) ;
	IN_SOME_COEFFICIENT : in std_logic_vector ( 11 downto 0 ) ;
	
	OUT_RESULT : out std_logic 	
) ;  

end entity test ;





architecture rtl_test of test is 

component multiplier is

port
(
	dataa	: in std_logic_vector ( 11 downto 0 ) ;
	datab	: in std_logic_vector ( 15 downto 0 ) ;
	result	: out std_logic_vector ( 27 downto 0 )
) ;

end component multiplier ;

type result_array is array ( 0 to 199 ) of std_logic_vector ( 27 downto 0 ) ;

signal result : result_array ;
signal prevent_optimization : std_logic_vector ( 0 to 199 ) ;

begin 
	
	OUT_RESULT <= '1' when prevent_optimization = ( prevent_optimization ' range => '1' ) else '0' ; 
	
	generate_multipliers : for index in 0 to 199
	generate 
	
		multiplier_instantiation : multiplier
		
		port map
		(
			dataa	=> IN_SOME_COEFFICIENT ,  
			datab	=> IN_SOME_DATA ,
			result	=> result ( index ) 
		) ;

		prevent_optimization ( index ) <= result ( index ) ( 27 ) ;
		
	end generate ;

end architecture rtl_test ;
 

While waiting for a compile I've been looking at the Altera documentation and it appears nowhere do they exit the market speak and actually tell one how to access this feature of dual or triple multipliers in one DSP. I'm sure the part is capable of doing this but they sure don't tell anyone how.

I've seen this kind of garbage before with Altera, but I kind of hoped it would stop now that Intel owns them. This is one of the reasons I don't like using Altera, their documentation department seems to be totally out of control. I'll bet it turns out the feature isn't supported by the tools because they discovered a problem in the silicon of the DSP and marketing (which controls the documentation) ignored that engineering said the feature is not available. Not having the feature doesn't allow them to add all the market speak for our DSP one ups Xilinx's DSP, so buy our part it's better (not telling you the feature doesn't work and is not supported by the tools).

I've also noticed their errata is much harder to find on their website. Xilinx puts it right their with the rest of their documentation. I like Altera's parts, but I wish engineering would take over the documentation of the parts and leave marketing to only produce the family overview document (which I never read, only a useless manager or a Cxx would read that rubbish ;-)).
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
The example manages to use all 25 DSP blocks of Cyclone5 A2 in dual 18x18 mode (with Quartus 13.1)


Code VHDL - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
 
entity test1 is
generic(
 n : integer := 50;
 w : integer := 18
);
port(
    clk : in STD_LOGIC;
    sel : in integer range 0 to n-1;
    ax  : in signed(w-1 downto 0);
    bx  : in signed(w-1 downto 0);
    cx  : out SIGNED(2*w-1 downto 0)
);
end test1;
 
architecture rtl of test1 is
type ar18 is array(0 to n-1) of signed(w-1 downto 0);
type ar36 is array(0 to n-1) of signed(2*w-1 downto 0);
 
signal ar : ar18;
signal br : ar18;
signal cr : ar36;
begin
process (clk)
    begin
        if rising_edge(clk) then
            for i in 0 to n-1 loop
                cr(i) <= ar(i)*br(i);
                if i = sel then
                    ar(i) <= ax;
                    br(i) <= bx;
                    cx <= cr(i);
                end if;
            end loop;
        end if;
    end process;
end rtl

;

I conclude that packing multipliers generally works, but may be there are constraints.

During synthesis, Quartus calculates a DSP block count without considering possible packing and seems to perform the packing in fitter phase.

- - - Updated - - -

As far as I researched - the Cyclone V FPGA family doesn't support explicit instantiation of DSP blocks.
There's a cyclonev_mac wysiwyg primitive in cyclonev_components.vhd. But you need to find out the parameters.

Newest Altera wysiwyg documentation that I know is in QUIP toolkit 9.0 from 2009, not going beyond Cyclone/Stratix III.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
It would be really interesting if the packing of multipliers is because FvM used the signed type in the ax, bx, and cx inputs/outputs as opposed to std_logic_vector used by shaiko, which presumably gets converted to signed or unsigned or does the multiplier component use std_logic_arith?. Might be worth checking.
 

Don't believe that signed type makes a difference. The behavior seems to be essentially the same if the multiplier is implemented through lpm_mult component, which exposes signed or unsigned data type as std_logic_vector as all Quartus arithmetic libraries do. I used lpm_mult in a pervious test.

I didn't yet see a real-world example where the packing fails.
 

The post I linked was for versions 14 and 15 of Quartus, and more for the 8x8 or 9x9 case.
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top