how to set precision in floating point arithmetic

arishsu · Aug 23, 2014

Can we control precision in floating point arithmetic operations? For example,in a fp multiplier, for some applications I need precision of 2 bits, in some other application precision is not important and in some other I need an 8-bit precision. So that the circuit can reconfigure itself according to the given precision.

TrickyDicky · Aug 23, 2014

Floating point it's a fixed 32 bit with with 9 bit exponent and 23 bit mantissa. Address you sir you are not talking about fixed point?

arishsu · Aug 23, 2014

TrickyDicky said:
Floating point it's a fixed 32 bit with with 9 bit exponent and 23 bit mantissa. Address you sir you are not talking about fixed point?

No. Not fixed point. suppose I have two numbers to multiply-1.21354586 and 1.45266654. And I am using a fp multiplier. And the output of the fp multiplier will be of the form of 32-bit binary.1 sign bit, 8-bit exp and 23-bit mantissa.
Here, I need to round the result upto 2-bit as 1.76. Is that possible? I mean is that possible during the normalizing and rounding operations of the fp multiplier?

andre_luis · Aug 23, 2014

Although you did not specified exactly the application for what it is required, usually there are indirect manners to achieve some control to the resolution of the calculus by customized configuration.

In some compilers for instance, you can define the magnitude of the float point number in terms of decimal digits that represents it, so that this can reduce the core processing. In simulators - for non critical designs - this proceeding allows reduce the overall processing time.

FvM · Aug 23, 2014

Floating point IP from FPGA and tool vendors may support custom floating point formats besides IEEE standard single and double precision. These are however fixed parameters of the core generator.

Generally I don't see a problem to manipulate the result according to your requirements after calculation, or to write your own floating point core.

arishsu · Aug 24, 2014

andre_teprom said:
In some compilers for instance, you can define the magnitude of the float point number in terms of decimal digits that represents it, so that this can reduce the core processing. In simulators - for non critical designs - this proceeding allows reduce the overall processing time.

Yes. That's what I am looking for. Something like reduce core processing or reduce processing time by reducing the number of bits for precision. Can you please provide some more details about that?

FvM · Aug 24, 2014

From my personal scope, I can tell that the Altera floating point IP only supports user specified formats between single and double precision, but no reduced size. So if I want something of the questioned kind in Altera, I have to design it myself.

andre_luis · Aug 24, 2014

arishsu said:
Yes. That's what I am looking for. Something like reduce core processing or reduce processing time by reducing the number of bits for precision. Can you please provide some more details about that?

While you do not specify precisely the tool with what you´re working, we cannot do much more.

arishsu · Aug 25, 2014

FvM said:
From my personal scope, I can tell that the Altera floating point IP only supports user specified formats between single and double precision, but no reduced size. So if I want something of the questioned kind in Altera, I have to design it myself.

andre_teprom said:
While you do not specify precisely the tool with what you´re working, we cannot do much more.

I am using xilinx ISE and verilog. I am actually trying to write my own code so that I can make it reconfigurable according to the precision requirement.

ads-ee · Aug 26, 2014

Use parameters for the width of the exponent and mantissa. I would probably just plagiarize the IEEE 754 standard and allow a flexible width instead of the standards explicit bit widths.

If you desire something simpler then the IEEE standard just use an exponent based on 2^-N to 2^N and a fractional mantissa. Also don't complicate things by adding in the "invisible" 1 that resides in an imaginary bit to the left of the MSB of the mantissa that 754 uses.

As this parameterizable format is custom you'll have to create custom parameterizable code for +, -, *, and /. Seems like a lot of work, but the benefit is being able to optimize the amount of logic required for algorithms that don't need a huge amount of precision in their calculations but require large dynamic range. Seems like a useful piece of IP.

Regards

Welcome to EDAboard.com

how to set precision in floating point arithmetic

arishsu

Member level 3

TrickyDicky

Advanced Member level 7

AISU

arishsu

Member level 3

andre_luis

Super Moderator

FvM

Super Moderator

arishsu

Member level 3

FvM

Super Moderator

andre_luis

Super Moderator

arishsu

Member level 3

ads-ee

Super Moderator

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics