[SOLVED] square root output in two clockcycle....

dipin · Oct 27, 2014

HI,

Did anyone know any methode to find squareroot using 2 clockcycle???

in xilinx ip core (cordic), with out pipelining with 2 clok latency got output.

but in fully pipeline mode it will take 5 clocks.

i have checked cordic algoritham but it will atleast take N/2 clockcycle.

i have written program using cordic and nonrestoring algoritham. both programs took N/2 clockcycle to get

output.(N-input width).how can i get output with in 2 clockcxycle if any one know how please help. i got tired

of searching in intenet.

thanks and regards

barry · Oct 27, 2014

Depending on the resolution you require, etc., you could use a lookup table approach. This would obviously require significant memory resources, but if you are willing to trade-off real-estate for speed...

Nicolás Celedón · Oct 27, 2014

Just to see if i could help in any other way. Why does it need to be in 2 cycles?

FvM · Oct 27, 2014

You can always line-up multiple iterations of the cordic algorithm in a single clock cycle, at the expense of logic resources and reduced maximum clock speed.

dipin · Oct 28, 2014

hi

this is the link i reffered
https://www.convict.lu/Jeunes/Math/square_root_CORDIC.htm

actually is this really the cordic methode????
because in cordic methode they wont use multiplier. but in this i need a multiplier.in internet for integers i didnt get any examples for cordic methode except this. :bang:

plz give your suggestion.

regards

FvM · Oct 28, 2014

You didn't mention why you want to use cordic method?

dipin · Oct 28, 2014

You didn't mention why you want to use cordic method?

hi,
thanks for the replay fvm
i need to reduce the number of clock cycle it takes. for nonrestoring division it takes n/2 clockcycle . ( code posted in previous thred).
more over in xilinx cordic core they are getting output in 2 clockcycle (with out pipelining). so i thought of trying cordic methode so that i can reduce the clock cycle.
for 32 bit input atleast i need to get output in 6 clockcycle

thanks and regards

axcdd · Oct 28, 2014

What's the FMAX u want to achive, because 32bit is large enought.
32 bits is from 16 bit X*2+Y*2 ??

dipin · Oct 28, 2014

HI

What's the FMAX u want to achive

,first i need to do is to reduce the number of clockcycle .

because 32bit is large enought.
32 bits is from 16 bit X*2+Y*2 ??[/

really sorry i didnt get this ??
thanks & regards

axcdd · Oct 28, 2014

The question was, why do u need 32bit square root c ? Is it because data is from 32 bit ADC ?
You wrote that u tried cordic algorithm ( which required X,Y cordinates)

dipin · Oct 28, 2014

HI,

axcdd said:
The question was, why do u need 32bit square root c ? Is it because data is from 32 bit ADC ?
You wrote that u tried cordic algorithm ( which required X,Y cordinates)

because i need to use this in a verilog program which output is 32 bit.

then actually i am using this link as reference
https://www.convict.lu/Jeunes/Math/square_root_CORDIC.htm
thanks

axcdd · Oct 28, 2014

So i fmax is not critical u can create 16bit ROM with SQRT of address line in it as output. Then as an input put 16 MSB of your input word that aren't zeros. Then rotate output word left to get real number.
pseudo code here:

Code:

  if rising_edge(CLK_i) then    
    if    INPUT(31 downto 16) = x"0000" then OUTPUT <=  SQRT_ROM(INPUT(15 downto 0));  
    elsif INPUT(31 downto 20) = x"000"  then OUTPUT <=  ROTATE_LEFT (SQRT_ROM(INPUT(19 downto 4),2);  
    elsif INPUT(31 downto 24) = x"00"   then OUTPUT <=  ROTATE_LEFT (SQRT_ROM(INPUT(23 downto 8),4);
    elsif INPUT(31 downto 28) = x"0"    then OUTPUT <=  ROTATE_LEFT (SQRT_ROM(INPUT(27 downto 12)))),6);  
    else  OUTPUT <=  ROTATE_LEFT (SQRT_ROM(INPUT(31 downto 16),8);        
    end if;
  end if;

the error of truncating in this method should be less then 1% (around 0.5%)

dipin · Oct 28, 2014

HI,

axcdd said:
So i fmax is not critical u can create 16bit ROM with SQRT of address line in it as output. Then as an input put 16 MSB of your input word that aren't zeros. Then rotate output word left to get real number.
pseudo code here:

Code:

if rising_edge(CLK_i) then if INPUT(31 downto 16) = x"0000" then OUTPUT <= SQRT_ROM(INPUT(15 downto 0)); elsif INPUT(31 downto 20) = x"000" then OUTPUT <= ROTATE_LEFT (SQRT_ROM(INPUT(19 downto 4),2); elsif INPUT(31 downto 24) = x"00" then OUTPUT <= ROTATE_LEFT (SQRT_ROM(INPUT(23 downto 8),4); elsif INPUT(31 downto 28) = x"0" then OUTPUT <= ROTATE_LEFT (SQRT_ROM(INPUT(27 downto 12)))),6); else OUTPUT <= ROTATE_LEFT (SQRT_ROM(INPUT(31 downto 16),8); end if; end if;

the error of truncating in this method should be less then 1% (around 0.5%)

thanks for the replay.
can you please comment a little bit more about the above code. iam not able to get it completely.
thanks

axcdd · Oct 28, 2014

1. Create a 16-bit ROM with SQRT address line as output.
example address is "0101_0100_0111_0101" (21621 dec) -> value in that address field is "1001_0011" (147 dec)

2.Input signal is 32-bit wide but ROM is only 16bit so there is need to shrink input word. From base math u know that SQRT(4x) = 2 * SQRT(x)
Multiply by two is equall to single rotation left.
example input word "0101_0100_0111_0101_01" (86485 dec) = (294,0833....) -> so u take "0101_0100_0111_0101" as rom address (147 dec as output) and rotate is left once (294 dec) output.

3. Checking for zeros if condition is because if u had low input signal like 4 (0000_0000_0000_0000_0100") and placed "0000_0000_0000_0000" in rom output would be 0 instead of 2

4. You can generqate SQRT rom values from a functions like:

Code:

  FUNCTION SQRT2 (Number_of_samples : integer) RETURN unsigned_array IS
  variable result_v : unsigned_array(0 to Number_of_samples-1) := (others=>(others=>'0'));
  begin						
    for i in result_v'range loop
  	   result_v(i) := to_unsigned(integer(SQRT(real(i))),resize_to_number_of_bytes_u_want);
  	 end loop;
  RETURN result_v;
  END FUNCTION SQRT2;

using math_real library (VDHL) dunno the verilog libraries and code standards

hope that helped.

Nicolás Celedón · Oct 29, 2014

The solution proposed by axcdd it's nice, but you are going to need a large ROM.

I have never tried but maybe you could also split in segments the function sqrt(X) using linear aproximations, so you will finally need to save in ROM some "m" and "b" parameters of the "y=mx+b". instead of having the SQRT function, you will have a few linear functions. As the index you could use maybe the MSByte. You still need to check the error of this method because i've never tried with a SQRT function. Also the idea is that you use less ROM that the axcdd method, otherwise i guess is not worth it. try in matlab.

does it make sens?

FvM · Oct 29, 2014

Table interpolation (piecewise linear approximation) makes sense for most math functions. I'm e.g. representing a half quarter of atan function with a 256 point table, achieving 16 bit accuracy.

Welcome to EDAboard.com

[SOLVED] square root output in two clockcycle....

Full Member level 4

Advanced Member level 7

Junior Member level 2

Super Moderator

Full Member level 4

Super Moderator

Full Member level 4

Full Member level 3

Full Member level 4

Full Member level 3

Full Member level 4

Full Member level 3

Full Member level 4

Full Member level 3

Junior Member level 2

Super Moderator

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor