Use of functions,procedures in vhdl

permute · Sep 1, 2010

roub: clocked processes each describe hardware. Functions can be used, but complex functions rarely infer an area/performance efficient design. You seem to be intent on an example of x % y, where both x and y are user inputs. if this is written as a function, it might infer (for the 2b case):
output = x when ( y > x) else x - y when ( 2y > x) else x - 2y when (3y > x) else x - 3y.
(which grows exponentially with n). for larger x,y intermediate terms would be used:
z1 = x when ( 4y > x) else x-4y;
z2 = z1 when (2y > z1) else z1-2y;
z3 = z2 when (y > z2) else z2-y;
output = z3;

The above has no memory elements -- thus it will be flattened to:
output = (( x when ( 4y > x) else x-4y) when (2y > ( x when ( 4y > x) else x-4y)) else (( x when ( 4y > x) else x-4y)-2y)) when (y > ((x when ( 4y > x) else x-4y) when (2y > (x when ( 4y > x) else x-4y)) else ((x when ( 4y > x) else x-4y)-2y))) else ((( x when ( 4y > x) else x-4y) when (2y > ( x when ( 4y > x) else x-4y)) else (( x when ( 4y > x) else x-4y)-2y))-y);
(i'm pretty sure the above isn't exactly valid VHDL.) The point is that, when a function is used in a process, code with loops can result in the synthesis tool trying to optimize a complex expression.

The synthesis tool would likely choose to infer 3 subtractors (as shown by the first representation), and use intermediate results.

for the 8b case, the synthesizer would likely infer 8 subtractors, with a longest path that must traverse though all 8. for 32b, it would infer 32 subtractors, and require the longest path traverse all 32. This is a very long path, and limits the clock rate to fairly low values. It also uses a lot of area. The latency (in clock cycles) is minimized for this method, but it uses a lot of area, and only allows a slow clock rate. This implementation is chosen because the code is written to allow only 1 clock cycle for the computation of x % y, and x % y requires a complex circuit to complete in 1 cycle.

If area is a concern, the above could be broken apart, possible performing 1 subtraction stages per clock cycle. The throughput is then limited to a valid output every 32 cycles. The longest path uses only 1 subtract, so clock rate can be high. This method uses little area, but the throughput is limited. In this method, a state machine is used to determine when the output is valid, as well as select the input to the subtractor.

If both clock rate and throughput are concerns, then the design can be pipelined. The area-efficient state-machine approach requires only 1 subtraction circuit. The fully pipelined version would use use 32 subtraction circuits, but place a register on the output of every subtraction. It now takes a latency of 32 cycles before the first output is valid, after which each cycle could be valid. Again, the longest path is only 1 subtract, so clock rate can be high. This method uses a lot of area, but provides for a high throughput.

rourabpaul · Sep 1, 2010

TO FvM
im trying to write the whole main program(binary method,previous attached file) with different states and included the states of mod operation with the states of main program under a single main process(clk)

FvM · Sep 1, 2010

I wonder, if the intention is to demonstrate a syntactical and functional correct VHDL implementation of modular exponentiation, or if additional requirements (e.g. speed, resource utilization, a particular bit-length) have to be met? In the latter case, it would be reasonable to identify suitable implementation strategies before start coding.

Performing the "outer" binary method loop and a serialized modulo "inner" loop within a single process is of course one of several options.

It should be known, that modular exponentiation is the basic operation of most public key algorithms, thus you can expect a large amount of speed and/or resource optimized implementations, including VHDL versions.

For those, who don't know what's the idea behind modular exponentiation, see:
Modular exponentiation - Wikipedia, the free encyclopedia

rourabpaul · Sep 7, 2010

TO fvm
can you give me any hints about parallel modulus logic than i can code it in vhdl for to increase the speed of my circuit?

---------- Post added at 05:54 PM ---------- Previous post was at 05:38 PM ----------

in previous binary method you see two 'mod' operation is there with two different manner ,if i use machine state the program will be huge,
is there any process in VHDL that i declare a sequential code(say modulus) one time in a programm body,
and i can call this modulus lots of time in main program,like function(but previously i could know we could declare a sequential logic circuit in a function)

FvM · Sep 7, 2010

As you know, modulus and didvider operation is basically the same, except for using a different part of the results.

I'm using vendor libraries, e.g. the Altera Q.uartus Megafunction for parallel divide and modulus operation. As most vendor tools include similar libraries, you don't need to use your own code for it, although you can find free VHDL libraries and code generators on the internet.

You're right of course regarding the large resource requirement of parallel dividers and modulus designs. But I assume, that e.g. fast modular exponentiation code would use it.

is there any process in VHDL that i declare a sequential code(say modulus) one time in a programm body,
and i can call this modulus lots of time in main program,like function

You can use a single modulus calculation instance at multiple places in a process, but only once in a particular clock cycle. In the binary method, one modulus instance is only used in the last iteration, so you can rewrite the method in a way, that both operations are carried out mutual exclusively, sharing one modulus instance. VHDL has no specifique means for sharing resources this way, you have to manage it in your code. But it's basically possible.

As I mentioned before, if you're interested in an effective and fast coding of modular exponentiation, you should review the said crypto cores, how they manage it.

rourabpaul · Sep 9, 2010

ya,i know all about the speed of a crypto cores,but my problem is in modulus operation,
i know there is identifier, 'mod' in VHDL but, i have tried it,but it can not work,so i need program of mod operation,and that is why im trying for it,
but the modulus code which i have is so time consuming and lesser speed,for my purpose parallel mod code is more suitable,
if you have this parallel modulus/divider code pls post me

FvM · Sep 9, 2010

i know there is identifier, 'mod' in VHDL but, i have tried it,but it can not work

It's generally working for me, so personally I don't bother about low level coding of parallel divider or modulus operation. I guess "not working" can be a problem of supplying the wrong data types to the library function. The divider/modulus inference of the synthesis tools is most likely limited to 32 Bit word length, if you are trying with larger signals, it will fail.

I have seen various divider codes on the internet, I'm sure that a thorough search will reveal many of them. See the below opencores.org project as one example:
Hardware Division Units :: Overview :: OpenCores

rourabpaul · Sep 9, 2010

i have tried modulus in lots of way,it is also less den 32 bit,
and its not working,its says that the modulus operator should be power of two,
have you tried it?
what was the ports type?
which libraries u have used?

FvM · Sep 9, 2010

With Altera Quartus, it's supported for signed, unsigned and integer in numeric_std and for integer in std_logic_arith.

rourabpaul · Sep 9, 2010

but what can i do for ISE11???

TrickyDicky · Sep 9, 2010

What types are the numbers you are trying to do the mod function with? you cannot use "mod" on std_logic_vector.

FvM · Sep 9, 2010

Xilinx has a divider in coregen, the documentation doesn't tell about inference from arithmetic operators. Perhaps a Xilinx user knows? But you can always instantiate it in the regular way.

rourabpaul · Sep 9, 2010

i have used integer type number,
and the library functions i have used are
"library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;"

to FvM
i cnt see your previous link

FvM · Sep 9, 2010

i cnt see your previous link

May be. The project is "Hardware Division Units" from opencores.org Arithmetic core projects.
But Xilinx division coregen should do the same.

Welcome to EDAboard.com

Use of functions,procedures in vhdl

permute

Advanced Member level 3

rourabpaul

Member level 3

FvM

Super Moderator

rourabpaul

Member level 3

FvM

Super Moderator

rourabpaul

Member level 3

FvM

Super Moderator

rourabpaul

Member level 3

FvM

Super Moderator

rourabpaul

Member level 3

TrickyDicky

Advanced Member level 7

FvM

Super Moderator

rourabpaul

Member level 3

FvM

Super Moderator

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics