I am new to Vivado , but it seems like Vivado 17.4 takes longer than it should to run through Synthesis and Implementation, i'm working on a design of sha-512 algorithm( hash function using in security) ,utilization is attached.
it takes around 3 hours to complete implementation.
Is my computer is an effective factor or it is normal in vivado? and how can i speed this up?
View attachment 153159
Evidence for it lacking pipelining...
LUT utilization of 17%
FF utilization of 1%
along with IO utilization of 70% means the design is spread all over the die but has virtually no registers to pipeline across the die.
I recall other threads on aspects of this design and from what I remember of the code snippets, I didn't think this design would run better than a few MHz.
The link is about pipelining with HLS compiler, not applicable to generic HDL code.is Loop Pipelining or Loop Unrolling like that
w(15)<=Message_block(63 downto 0);
w(14)<=Message_block(127 downto 64);
w(13) <=Message_block(191 downto 128);
w(12)<=Message_block(255 downto 192);
w(11) <=Message_block(319 downto 256);
w(10) <=Message_block(383 downto 320);
w(9) <=Message_block(447 downto 384);
w(8) <=Message_block(511 downto 448);
w(7) <=Message_block(575 downto 512);
w(6) <=Message_block(639 downto 576);
w(5) <=Message_block(703 downto 640);
w(4) <=Message_block(767 downto 704);
w(3)<=Message_block(831 downto 768);
w(2)<=Message_block(895 downto 832);
w(1)<=Message_block(959 downto 896);
w(0)<=Message_block(1023 downto 960);
wordGen : for t in 16 to (79) generate
WOW : WordT port map(w((t-2)), w((t-7)) , w((t-15)) , w((t-16)),w(t));
end generate ;
REGIS: for i in 0 to 79 generate
reg: Reg64 port map (clk,rst,w(i),wo(i));
end generate;
entity Reg64 is
Port (clk,rst:in std_logic;
D: in std_logic_vector(63 downto 0);
RegW: out std_logic_vector(63 downto 0) );
end Reg64;
architecture Behavioral of Reg64 is
begin
process(clk,rst)
begin
if (rst ='0' ) then RegW <= (others=> '0');
elsif (clk'event and clk = '1') then
RegW <= D;
end if;
end process;
end Behavioral;
" when we say to pipeline we mean to split a long combinational logic into two or more stages "I can't understand what you are doing (coding a register bank?!), but it doesn't look like pipelining. when we say to pipeline we mean to split a long combinational logic into two or more stages. the first code snippet has no combinational logic, just some mapping. I doubt that is the problem.
library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.NUMERIC_STD.All;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity Round_sha is
port( clk,rst,init: in STD_LOGIC;
IV: in std_logic_vector(511 downto 0);
Message_block: in std_logic_vector(1023 downto 0);
Hashed_512: out std_logic_vector(511 downto 0));
end Round_sha;
architecture rtl of Round_sha is
component K_rom IS
PORT ( addr: IN INTEGER RANGE 0 TO 79;
data: OUT STD_LOGIC_VECTOR (63 DOWNTO 0));
END component;
component WordT is
port (w2,w7,w15,w16: in std_logic_vector (63 downto 0);
wnext: out std_logic_vector (63 downto 0));
end component;
Component Func_round is
port(a,b,c,e,f,g: in std_logic_vector(63 downto 0);
f0,f1,f2,f3: out std_logic_vector( 63 downto 0));
end component;
type word is array (-1 to 79 ) of std_logic_vector(63 downto 0);
signal a,b,c,d,e,f,g,h:word;
type word2 is array (0 to 79 ) of std_logic_vector(63 downto 0);
signal w,f0,f1,f2,f3,k: word2;
begin
--assigning block to words
--words from t=0:15
w(15)<=Message_block(63 downto 0);
w(14)<=Message_block(127 downto 64);
w(13) <=Message_block(191 downto 128);
w(12)<=Message_block(255 downto 192);
w(11) <=Message_block(319 downto 256);
w(10) <=Message_block(383 downto 320);
w(9) <=Message_block(447 downto 384);
w(8) <=Message_block(511 downto 448);
w(7) <=Message_block(575 downto 512);
w(6) <=Message_block(639 downto 576);
w(5) <=Message_block(703 downto 640);
w(4) <=Message_block(767 downto 704);
w(3)<=Message_block(831 downto 768);
w(2)<=Message_block(895 downto 832);
w(1)<=Message_block(959 downto 896);
w(0)<=Message_block(1023 downto 960);
a(-1) <= IV(511 downto 448); -- <= X"6a09e667f3bcc908";
b(-1) <= IV(447 downto 384); -- <= X"bb67ae8584caa73b";
c(-1) <= IV(383 downto 320); -- <= X"3c6ef372fe94f82b";
d(-1) <= IV(319 downto 256); -- <= X"a54ff53a5f1d36f1";
e(-1) <= IV(255 downto 192); -- <= X"510e527fade682d1";
f(-1) <= IV(191 downto 128); -- <= X"9b05688c2b3e6c1f";
g(-1) <= IV(127 downto 64) ; -- <= X"1f83d9abfb41bd6b";
h(-1) <= IV(63 downto 0); -- <= X"5be0cd19137e2179";
--words from t=16:79
wordGen : for t in 16 to (79) generate
WOW: WordT port map(w((t-2)) , w((t-7)) , w((t-15)) , w((t-16)),w(t));
end generate ; -- wordGen
Funcc: for i in 0 to 79 generate
Func: Func_round port map(a(i-1),b(i-1),c(i-1),e(i-1),f(i),g(i),f0(i),f1(i),f2(i),f3(i));
KROM: K_rom port map (i,k(i));
h(i) <= g(i-1);
g(i) <= f(i-1);
f(i) <= e(i-1);
-- e <= d + T1 ;
-- e <= d + h + f3 + f0 + k + w;
e (i) <= std_logic_vector(unsigned(d(i-1)) + Unsigned(h(i-1)) + unsigned(f3(i)) + unsigned(f0(i)) + unsigned(k(i)) + unsigned(w(i)));
d (i) <= c(i-1);
c (i) <= b(i-1);
b (i) <= a(i-1);
-- a <= T1 + T2 ;
-- a <= h + f3 + f0 + k + w + f2 + f1;
a(i) <= std_logic_vector(unsigned(h(i-1)) +unsigned(f3(i)) + unsigned(f0(i)) +unsigned(k(i)) + unsigned(w(i)) + unsigned(f2(i)) + unsigned(f1(i)));
end generate;
process (clk,rst)
variable H0 , H1 , H2 , H3 , H4 , H5 , H6 ,H7 : Std_logic_vector ( 63 downto 0);
begin
if (rst = '1') then
H0 := X"6a09e667f3bcc908";
H1 := X"bb67ae8584caa73b";
H2 := X"3c6ef372fe94f82b";
H3 := X"a54ff53a5f1d36f1";
H4 := X"510e527fade682d1";
H5 := X"9b05688c2b3e6c1f";
H6 := X"1f83d9abfb41bd6b";
H7 := X"5be0cd19137e2179";
elsif (init ='1') then
if ((clk = '1') and clk'event) then
H0 := std_logic_vector( unsigned(a(79)) + unsigned(a(-1)));
H1 := std_logic_vector( unsigned(b(79)) + unsigned(b(-1)));
H2 := std_logic_vector( unsigned(c(79)) + unsigned(c(-1)));
H3 := std_logic_vector( unsigned(d(79)) + unsigned(d(-1)));
H4 := std_logic_vector( unsigned(e(79)) + unsigned(e(-1)));
H5 := std_logic_vector( unsigned(f(79)) + unsigned(f(-1)));
H6 := std_logic_vector( unsigned(g(79)) + unsigned(g(-1)));
H7 := std_logic_vector( unsigned(h(79)) + unsigned(h(-1)));
-- h0 <= a + h0;
-- h1 <= b + h1;
-- h2 <= c + h2;
-- h3 <= d + h3;
-- h4 <= e + h4;
-- h5 <= f + h5;
-- h6 <= g + h6;
-- h7 <= h + h7;
end if;
end if;
Hashed_512<= H0&H1&H2&H3&H4&H5&H6&H7;
end process;
end rtl;
why is init outside the clock condition?> you're forcing the clock into logic, which is probably causing massive timing failures. Either put the init check inside the clock or remove it altogether.
You also have massive logic chains as I dont see any pipelining between the components generated in the large generate loops! You need pipelining at ALL stages, not just the output stage. You design has basically zero pipelining.
Any pipeline latency appears in the longest feedback path due to the output of the 80-rounds being the init for the next chunk. Maybe pipelining would help for the immediate problem of synthesis times, but the resulting design isn't that good anyways.
The 6kW power estimate is interesting.
please give some example or notes how can i make pipline in that loops
You keep asking the same question over and over when the answer was already given.
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?