+ Post New Thread
Results 1 to 14 of 14
  1. #1
    Advanced Member level 5
    Points: 11,720, Level: 25

    Join Date
    Aug 2011
    Posts
    2,400
    Helped
    278 / 278
    Points
    11,720
    Level
    25

    Single multiplier takes up a whole DSP block for

    Hello,

    Already posted it on alteraforum.com but didn't get a lot of help - so I'll try again here.

    I'm using a Cyclone V SOC FPGA.

    Currently my design has 8 multipliers (which I coded in VHDL instead of instantiating).
    The inputs to the multipliers are 12 and 16 bits wide.

    According to this document:
    https://www.altera.com/content/dam/a...clonev-dsp.pdf

    I expected the tool to pack 2 multipliers into a single DSP block - so that for 8 multipliers only 4 DSP blocks shall be consumed.
    Unfortunately - the compilation report shows that 8 DSP blocks are consumed (one per each multiplier).
    I tried to change the synthesis behavior to area driven - but nothing changed.

    Any idea what can cause such behavior ?
    Last edited by FvM; 14th June 2017 at 17:03. Reason: Fixed invalid link

  2. #2
    Super Moderator
    Points: 230,196, Level: 100
    Awards:
    1st Helpful Member

    Join Date
    Jan 2008
    Location
    Bochum, Germany
    Posts
    39,747
    Helped
    12128 / 12128
    Points
    230,196
    Level
    100

    Re: Single multiplier takes up a whole DSP block for

    I presume, Quartus won't necessarily pack the multipliers unless the DSP blocks are exhausted. Using individual DSP blocks might be advantageous routing-wise. Performing an addition at the multiplier output might be also a reason why individual DSP are used.


    1 members found this post helpful.

    •   Alt14th June 2017, 17:08

      advertising

        
       

  3. #3
    Advanced Member level 5
    Points: 33,436, Level: 44
    Achievements:
    7 years registered

    Join Date
    Jun 2010
    Posts
    6,127
    Helped
    1791 / 1791
    Points
    33,436
    Level
    44

    Re: Single multiplier takes up a whole DSP block for

    For the same reason you get M9Ks/M20ks in empty designs when a LUTRAM would be fine - because its easier.



  4. #4
    Advanced Member level 5
    Points: 11,720, Level: 25

    Join Date
    Aug 2011
    Posts
    2,400
    Helped
    278 / 278
    Points
    11,720
    Level
    25

    Re: Single multiplier takes up a whole DSP block for

    At first I also thought this is a "I can but I don't want" case.
    But it's not...the packing fails to occur even if I replicate the design and exhaust all multipliers.

    The device at question has 112 DSP block (224 multipliers).
    If I go beyond 112 multipliers ( for example 200 ) - the tool uses up all the 112 DSP blocks and for the remaining 88 multipliers it resorts to LUT implementation (and fails during fitting because it exhausts all the combinatorial blocks).

    P.S: just tried it. The problem persists even if I use multipliers from the IP catalog instead of using HDL.



  5. #5
    Advanced Member level 5
    Points: 33,436, Level: 44
    Achievements:
    7 years registered

    Join Date
    Jun 2010
    Posts
    6,127
    Helped
    1791 / 1791
    Points
    33,436
    Level
    44

    Re: Single multiplier takes up a whole DSP block for

    Please supply the code...



    •   Alt14th June 2017, 22:16

      advertising

        
       

  6. #6
    Super Moderator
    Points: 26,240, Level: 39
    ads-ee's Avatar
    Join Date
    Sep 2013
    Location
    USA
    Posts
    5,956
    Helped
    1462 / 1462
    Points
    26,240
    Level
    39

    Re: Single multiplier takes up a whole DSP block for

    I'd more likely wager this is a case of you have to instantiate the DSP IP with that intermediate multiplexer enabled. As I can't see how they would know you need to select that multiplexer and enable the pass through mode by inference. Perhaps there is some synthesis attribute that controls that.


    1 members found this post helpful.

    •   Alt14th June 2017, 22:41

      advertising

        
       

  7. #7
    Advanced Member level 5
    Points: 11,720, Level: 25

    Join Date
    Aug 2011
    Posts
    2,400
    Helped
    278 / 278
    Points
    11,720
    Level
    25

    Re: Single multiplier takes up a whole DSP block for

    I'd more likely wager this is a case of you have to instantiate the DSP IP
    As far as I researched - the Cyclone V FPGA family doesn't support explicit instantiation of DSP blocks (As you can do with a Xilinx DSP48 for example).



  8. #8
    Advanced Member level 5
    Points: 11,720, Level: 25

    Join Date
    Aug 2011
    Posts
    2,400
    Helped
    278 / 278
    Points
    11,720
    Level
    25

    Re: Single multiplier takes up a whole DSP block for

    Code:
    library ieee ;
    	use ieee.std_logic_1164.all ;
    	use ieee.numeric_std.all ;
    
    entity test is
     	
    port 	        
    (	     
    	IN_SOME_DATA : in std_logic_vector ( 15 downto 0 ) ;
    	IN_SOME_COEFFICIENT : in std_logic_vector ( 11 downto 0 ) ;
    	
    	OUT_RESULT : out std_logic 	
    ) ;  
    
    end entity test ;
    
    
    
    
    
    architecture rtl_test of test is 
    
    component multiplier is
    
    port
    (
    	dataa	: in std_logic_vector ( 11 downto 0 ) ;
    	datab	: in std_logic_vector ( 15 downto 0 ) ;
    	result	: out std_logic_vector ( 27 downto 0 )
    ) ;
    
    end component multiplier ;
    
    type result_array is array ( 0 to 199 ) of std_logic_vector ( 27 downto 0 ) ;
    
    signal result : result_array ;
    signal prevent_optimization : std_logic_vector ( 0 to 199 ) ;
    
    begin 
    	
    	OUT_RESULT <= '1' when prevent_optimization = ( prevent_optimization ' range => '1' ) else '0' ; 
    	
    	generate_multipliers : for index in 0 to 199
    	generate 
    	
    		multiplier_instantiation : multiplier
    		
    		port map
    		(
    			dataa	=> IN_SOME_COEFFICIENT ,  
    			datab	=> IN_SOME_DATA ,
    			result	=> result ( index ) 
    		) ;
    
    		prevent_optimization ( index ) <= result ( index ) ( 27 ) ;
    		
    	end generate ;
    
    end architecture rtl_test ;



  9. #9
    Super Moderator
    Points: 26,240, Level: 39
    ads-ee's Avatar
    Join Date
    Sep 2013
    Location
    USA
    Posts
    5,956
    Helped
    1462 / 1462
    Points
    26,240
    Level
    39

    Re: Single multiplier takes up a whole DSP block for

    While waiting for a compile I've been looking at the Altera documentation and it appears nowhere do they exit the market speak and actually tell one how to access this feature of dual or triple multipliers in one DSP. I'm sure the part is capable of doing this but they sure don't tell anyone how.

    I've seen this kind of garbage before with Altera, but I kind of hoped it would stop now that Intel owns them. This is one of the reasons I don't like using Altera, their documentation department seems to be totally out of control. I'll bet it turns out the feature isn't supported by the tools because they discovered a problem in the silicon of the DSP and marketing (which controls the documentation) ignored that engineering said the feature is not available. Not having the feature doesn't allow them to add all the market speak for our DSP one ups Xilinx's DSP, so buy our part it's better (not telling you the feature doesn't work and is not supported by the tools).

    I've also noticed their errata is much harder to find on their website. Xilinx puts it right their with the rest of their documentation. I like Altera's parts, but I wish engineering would take over the documentation of the parts and leave marketing to only produce the family overview document (which I never read, only a useless manager or a Cxx would read that rubbish ).


    1 members found this post helpful.

  10. #10
    Advanced Member level 2
    Points: 4,052, Level: 15

    Join Date
    Feb 2015
    Posts
    672
    Helped
    203 / 203
    Points
    4,052
    Level
    15

    Re: Single multiplier takes up a whole DSP block for

    The internet suggest setting "auto pack registers" has issues with multipliers in some versions of Quartus. Perhaps this was fixed in a newer release.

    https://www.alteraforum.com/forum/sh...=48644&page=2& -- I actually thought this was your post until I looked at the date.


    1 members found this post helpful.

  11. #11
    Super Moderator
    Points: 230,196, Level: 100
    Awards:
    1st Helpful Member

    Join Date
    Jan 2008
    Location
    Bochum, Germany
    Posts
    39,747
    Helped
    12128 / 12128
    Points
    230,196
    Level
    100

    Re: Single multiplier takes up a whole DSP block for

    The example manages to use all 25 DSP blocks of Cyclone5 A2 in dual 18x18 mode (with Quartus 13.1)

    Code VHDL - [expand]
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    
    library IEEE;
    use IEEE.STD_LOGIC_1164.ALL;
    use IEEE.NUMERIC_STD.ALL;
     
    entity test1 is
    generic(
     n : integer := 50;
     w : integer := 18
    );
    port(
        clk : in STD_LOGIC;
        sel : in integer range 0 to n-1;
        ax  : in signed(w-1 downto 0);
        bx  : in signed(w-1 downto 0);
        cx  : out SIGNED(2*w-1 downto 0)
    );
    end test1;
     
    architecture rtl of test1 is
    type ar18 is array(0 to n-1) of signed(w-1 downto 0);
    type ar36 is array(0 to n-1) of signed(2*w-1 downto 0);
     
    signal ar : ar18;
    signal br : ar18;
    signal cr : ar36;
    begin
    process (clk)
        begin
            if rising_edge(clk) then
                for i in 0 to n-1 loop
                    cr(i) <= ar(i)*br(i);
                    if i = sel then
                        ar(i) <= ax;
                        br(i) <= bx;
                        cx <= cr(i);
                    end if;
                end loop;
            end if;
        end process;
    end rtl
    ;

    I conclude that packing multipliers generally works, but may be there are constraints.

    During synthesis, Quartus calculates a DSP block count without considering possible packing and seems to perform the packing in fitter phase.

    - - - Updated - - -

    As far as I researched - the Cyclone V FPGA family doesn't support explicit instantiation of DSP blocks.
    There's a cyclonev_mac wysiwyg primitive in cyclonev_components.vhd. But you need to find out the parameters.

    Newest Altera wysiwyg documentation that I know is in QUIP toolkit 9.0 from 2009, not going beyond Cyclone/Stratix III.


    1 members found this post helpful.

  12. #12
    Super Moderator
    Points: 26,240, Level: 39
    ads-ee's Avatar
    Join Date
    Sep 2013
    Location
    USA
    Posts
    5,956
    Helped
    1462 / 1462
    Points
    26,240
    Level
    39

    Re: Single multiplier takes up a whole DSP block for

    It would be really interesting if the packing of multipliers is because FvM used the signed type in the ax, bx, and cx inputs/outputs as opposed to std_logic_vector used by shaiko, which presumably gets converted to signed or unsigned or does the multiplier component use std_logic_arith?. Might be worth checking.



  13. #13
    Super Moderator
    Points: 230,196, Level: 100
    Awards:
    1st Helpful Member

    Join Date
    Jan 2008
    Location
    Bochum, Germany
    Posts
    39,747
    Helped
    12128 / 12128
    Points
    230,196
    Level
    100

    Re: Single multiplier takes up a whole DSP block for

    Don't believe that signed type makes a difference. The behavior seems to be essentially the same if the multiplier is implemented through lpm_mult component, which exposes signed or unsigned data type as std_logic_vector as all Quartus arithmetic libraries do. I used lpm_mult in a pervious test.

    I didn't yet see a real-world example where the packing fails.



    •   Alt15th June 2017, 17:20

      advertising

        
       

  14. #14
    Advanced Member level 2
    Points: 4,052, Level: 15

    Join Date
    Feb 2015
    Posts
    672
    Helped
    203 / 203
    Points
    4,052
    Level
    15

    Re: Single multiplier takes up a whole DSP block for

    The post I linked was for versions 14 and 15 of Quartus, and more for the 8x8 or 9x9 case.



--[[ ]]--