Vivado Taking A Long Time To Run Synthesis & Implementation

MSAKARIM · May 17, 2019

I am new to Vivado , but it seems like Vivado 17.4 takes longer than it should to run through Synthesis and Implementation, i'm working on a design of sha-512 algorithm( hash function using in security) ,utilization is attached.
it takes around 3 hours to complete implementation.
Is my computer is an effective factor or it is normal in vivado? and how can i speed this up?

ThisIsNotSam · May 17, 2019

MSAKARIM said:
I am new to Vivado , but it seems like Vivado 17.4 takes longer than it should to run through Synthesis and Implementation, i'm working on a design of sha-512 algorithm( hash function using in security) ,utilization is attached.
it takes around 3 hours to complete implementation.
Is my computer is an effective factor or it is normal in vivado? and how can i speed this up?

View attachment 153159

runtime is proportional no only to design size. maybe the target frequency is too high and the tool spends a lot of time optimizing. maybe the IO/floorplan is bad and the tool tries to overcome that.

FvM · May 17, 2019

Sounds like a poorly designed (e.g. pure combinational) sha implementation which hardly meets timing.

ads-ee · May 17, 2019

Evidence for it lacking pipelining...

LUT utilization of 17%
FF utilization of 1%
along with IO utilization of 70% means the design is spread all over the die but has virtually no registers to pipeline across the die.

I recall other threads on aspects of this design and from what I remember of the code snippets, I didn't think this design would run better than a few MHz.

wtr · May 21, 2019

You can get timestamps for when the various parts of par complete.

This way you can get a feel for how long design init, opt_design, place_design and route design take.

If you "pipelined" then the tool has an easier time during placement.

MSAKARIM · May 21, 2019

Re: Vivado Taking A Long Time To Run Synthesis & Implementation

SHA algorithm has 80 rounds (iterations), may this be the reason?
is Loop Pipelining or Loop Unrolling like that https://www.xilinx.com/support/docu...elines/concept_pipelining_loop_unrolling.html enhancing the design performance?

- - - Updated - - -

ads-ee said:
Evidence for it lacking pipelining...

LUT utilization of 17%
FF utilization of 1%
along with IO utilization of 70% means the design is spread all over the die but has virtually no registers to pipeline across the die.

I recall other threads on aspects of this design and from what I remember of the code snippets, I didn't think this design would run better than a few MHz.

Yes, i have posted threats about this before, but honestly i changed more things to enhance it and still takes long time during implementation.

ads-ee · May 21, 2019

If you are doing this in HLS then you need the pragma for pipelining according to that page.

FvM · May 21, 2019

is Loop Pipelining or Loop Unrolling like that

The link is about pipelining with HLS compiler, not applicable to generic HDL code.

As far as I understand, you are coding in generic VHDL. You need to implement pipelining explicitly using a clock and pipeline registers.

MSAKARIM · May 23, 2019

If i have this code partition:

Code:

w(15)<=Message_block(63 downto 0);
w(14)<=Message_block(127 downto 64);
w(13) <=Message_block(191 downto 128);
w(12)<=Message_block(255 downto 192);
w(11) <=Message_block(319 downto 256);
w(10) <=Message_block(383 downto 320);
w(9) <=Message_block(447 downto 384);
w(8) <=Message_block(511 downto 448);
w(7) <=Message_block(575 downto 512);
w(6) <=Message_block(639 downto 576);
w(5) <=Message_block(703 downto 640);
w(4) <=Message_block(767 downto 704);
w(3)<=Message_block(831 downto 768);
w(2)<=Message_block(895 downto 832);
w(1)<=Message_block(959 downto 896);
w(0)<=Message_block(1023 downto 960);
wordGen : for t in 16 to (79) generate

 WOW : WordT port map(w((t-2)), w((t-7)) , w((t-15)) , w((t-16)),w(t));
end generate ;

How can i make it pipelined?
what about this try ( adding this part to the previous code):

Code:

REGIS: for i in 0 to 79  generate

reg: Reg64 port map (clk,rst,w(i),wo(i));

end generate;

where REG64 code is:

Code:

entity Reg64 is
 Port (clk,rst:in std_logic;
       D: in std_logic_vector(63 downto 0);
       RegW: out std_logic_vector(63 downto 0) );
end Reg64;

architecture Behavioral of Reg64 is

begin

process(clk,rst)
begin
if (rst ='0' ) then RegW <= (others=> '0');
elsif (clk'event and clk = '1') then
RegW <= D;
end if;
end process;
end Behavioral;

ThisIsNotSam · May 23, 2019

I can't understand what you are doing (coding a register bank?!), but it doesn't look like pipelining. when we say to pipeline we mean to split a long combinational logic into two or more stages. the first code snippet has no combinational logic, just some mapping. I doubt that is the problem.

MSAKARIM · May 26, 2019

Re: Vivado Taking A Long Time To Run Synthesis & Implementation

ThisIsNotSam said:
I can't understand what you are doing (coding a register bank?!), but it doesn't look like pipelining. when we say to pipeline we mean to split a long combinational logic into two or more stages. the first code snippet has no combinational logic, just some mapping. I doubt that is the problem.

" when we say to pipeline we mean to split a long combinational logic into two or more stages "
>> i did this and still take more time.

- - - Updated - - -

This a part of my original code, Plz i need some notes (just notes) to make it pipline:

Code:

library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.NUMERIC_STD.All;
use IEEE.STD_LOGIC_UNSIGNED.ALL;


entity Round_sha is

port( clk,rst,init: in STD_LOGIC;   
      IV: in std_logic_vector(511 downto 0);
      Message_block: in std_logic_vector(1023 downto 0);
     Hashed_512: out std_logic_vector(511 downto 0));
end Round_sha;


architecture rtl of Round_sha is 

component K_rom IS
PORT ( addr: IN INTEGER RANGE 0 TO 79;
data: OUT STD_LOGIC_VECTOR (63 DOWNTO 0));
END component;

component WordT is 
port (w2,w7,w15,w16: in std_logic_vector (63 downto 0);
      wnext: out std_logic_vector (63 downto 0));
end component;  

Component Func_round is
port(a,b,c,e,f,g: in std_logic_vector(63 downto 0);
f0,f1,f2,f3: out std_logic_vector( 63 downto 0));
end component;



type word is array (-1 to 79 ) of std_logic_vector(63 downto 0);
signal a,b,c,d,e,f,g,h:word;
type word2 is array (0 to 79 ) of std_logic_vector(63 downto 0);
signal w,f0,f1,f2,f3,k: word2;

begin 


--assigning block to words


--words from t=0:15
w(15)<=Message_block(63 downto 0);
w(14)<=Message_block(127 downto 64);
w(13) <=Message_block(191 downto 128);
w(12)<=Message_block(255 downto 192);
w(11) <=Message_block(319 downto 256);
w(10) <=Message_block(383 downto 320);
w(9) <=Message_block(447 downto 384);
w(8) <=Message_block(511 downto 448);
w(7) <=Message_block(575 downto 512);
w(6) <=Message_block(639 downto 576);
w(5) <=Message_block(703 downto 640);
w(4) <=Message_block(767 downto 704);
w(3)<=Message_block(831 downto 768);
w(2)<=Message_block(895 downto 832);
w(1)<=Message_block(959 downto 896);
w(0)<=Message_block(1023 downto 960);
a(-1) <= IV(511 downto 448);        -- <= X"6a09e667f3bcc908";
b(-1) <= IV(447 downto 384);        -- <= X"bb67ae8584caa73b";
c(-1) <= IV(383 downto 320);        -- <= X"3c6ef372fe94f82b";
d(-1) <= IV(319 downto 256);        -- <= X"a54ff53a5f1d36f1";
e(-1) <= IV(255 downto 192);        -- <= X"510e527fade682d1";
f(-1) <= IV(191 downto 128);        -- <= X"9b05688c2b3e6c1f";
g(-1) <= IV(127 downto 64) ;       -- <= X"1f83d9abfb41bd6b";
h(-1) <= IV(63 downto 0);        -- <= X"5be0cd19137e2179";
--words from t=16:79
wordGen : for t in 16 to (79) generate

 WOW: WordT port map(w((t-2)) , w((t-7)) , w((t-15)) , w((t-16)),w(t));

end generate ; -- wordGen

Funcc: for i in 0 to 79  generate

Func: Func_round port map(a(i-1),b(i-1),c(i-1),e(i-1),f(i),g(i),f0(i),f1(i),f2(i),f3(i));
KROM: K_rom     port map (i,k(i));
        h(i)          <=  g(i-1);
        g(i)          <=  f(i-1);
        f(i)          <=  e(i-1);
--	e          <=  d +          T1        ;
--      e          <=  d + h + f3 + f0 + k + w;
        e (i)         <= std_logic_vector(unsigned(d(i-1)) + Unsigned(h(i-1)) + unsigned(f3(i)) + unsigned(f0(i)) + unsigned(k(i)) + unsigned(w(i)));
        d (i)         <=  c(i-1);
        c (i)         <=  b(i-1);
        b (i)         <=  a(i-1);
--	a          <=             T1           +    T2  ;
--      a          <=      h + f3 + f0 + k + w + f2 + f1;
        a(i)          <= std_logic_vector(unsigned(h(i-1)) +unsigned(f3(i)) + unsigned(f0(i)) +unsigned(k(i)) + unsigned(w(i))  + unsigned(f2(i)) + unsigned(f1(i)));


end generate;

 process (clk,rst)
variable   H0 , H1 , H2 , H3 , H4 , H5 , H6 ,H7     :     Std_logic_vector ( 63 downto 0);
  begin
   
      if (rst = '1') then
        H0         := X"6a09e667f3bcc908";
        H1         := X"bb67ae8584caa73b";
        H2         := X"3c6ef372fe94f82b";
        H3         := X"a54ff53a5f1d36f1";
        H4         := X"510e527fade682d1";
        H5         := X"9b05688c2b3e6c1f";
        H6         := X"1f83d9abfb41bd6b";
        H7         := X"5be0cd19137e2179";
      
      elsif  (init ='1') then
        if ((clk = '1') and clk'event) then 
        H0         := std_logic_vector( unsigned(a(79)) + unsigned(a(-1)));
        H1         := std_logic_vector( unsigned(b(79)) + unsigned(b(-1)));
        H2         := std_logic_vector( unsigned(c(79)) + unsigned(c(-1)));
        H3         := std_logic_vector( unsigned(d(79)) + unsigned(d(-1)));
        H4         := std_logic_vector( unsigned(e(79)) + unsigned(e(-1)));
        H5         := std_logic_vector( unsigned(f(79)) + unsigned(f(-1)));
        H6         := std_logic_vector( unsigned(g(79)) + unsigned(g(-1)));
        H7         := std_logic_vector( unsigned(h(79)) + unsigned(h(-1)));
--      h0         <=      a + h0;
--      h1         <=      b + h1;
--      h2         <=      c + h2;
--      h3         <=      d + h3;
--      h4         <=      e + h4;
--      h5         <=      f + h5;
--      h6         <=      g + h6;
--      h7         <=      h + h7;
    end if;
end if;
Hashed_512<= H0&H1&H2&H3&H4&H5&H6&H7;
  end process;

end rtl;

TrickyDicky · May 26, 2019

why is init outside the clock condition?> you're forcing the clock into logic, which is probably causing massive timing failures. Either put the init check inside the clock or remove it altogether.

You also have massive logic chains as I dont see any pipelining between the components generated in the large generate loops! You need pipelining at ALL stages, not just the output stage. You design has basically zero pipelining.

vGoodtimes · May 26, 2019

Any pipeline latency appears in the longest feedback path due to the output of the 80-rounds being the init for the next chunk. Maybe pipelining would help for the immediate problem of synthesis times, but the resulting design isn't that good anyways.

The 6kW power estimate is interesting.

MSAKARIM · May 27, 2019

Re: Vivado Taking A Long Time To Run Synthesis & Implementation

TrickyDicky said:
why is init outside the clock condition?> you're forcing the clock into logic, which is probably causing massive timing failures. Either put the init check inside the clock or remove it altogether.

You also have massive logic chains as I dont see any pipelining between the components generated in the large generate loops! You need pipelining at ALL stages, not just the output stage. You design has basically zero pipelining.

please give some example or notes how can i make pipline in that loops

- - - Updated - - -

vGoodtimes said:
Any pipeline latency appears in the longest feedback path due to the output of the 80-rounds being the init for the next chunk. Maybe pipelining would help for the immediate problem of synthesis times, but the resulting design isn't that good anyways.

The 6kW power estimate is interesting.

How can i reduce this excessive power (too high) ?

ThisIsNotSam · May 27, 2019

Re: Vivado Taking A Long Time To Run Synthesis & Implementation

MSAKARIM said:
please give some example or notes how can i make pipline in that loops

I think you need help from a professor/class or a textbook. You keep asking the same question over and over when the answer was already given.

vGoodtimes · May 27, 2019

Re: Vivado Taking A Long Time To Run Synthesis & Implementation

ThisIsNotSam said:
You keep asking the same question over and over when the answer was already given.

I think he's asking a slightly different question that what's been answered. His problem is more similar to "how do you pipeline an IIR filter". The SHA512 computation has rounds where the output of each round becomes the input to the next. This is a tight feedback loop, similar to the IIR filter. The output after these rounds then becomes part of the input for the next chunk -- another feedback path like the IIR filter.

Vivado Taking A Long Time To Run Synthesis & Implementation

Full Member level 3

Advanced Member level 5

Super Moderator

Super Moderator

Full Member level 5

Full Member level 3

Super Moderator

Super Moderator

Full Member level 3

Advanced Member level 5

Full Member level 3

Advanced Member level 7

Advanced Member level 4

Full Member level 3

Advanced Member level 5

Advanced Member level 4

Similar threads

Privacy & Transparency

Privacy & Transparency