Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

how to set precision in floating point arithmetic

Status
Not open for further replies.

arishsu

Member level 3
Member level 3
Joined
Sep 10, 2013
Messages
66
Helped
6
Reputation
12
Reaction score
6
Trophy points
8
Location
India
Visit site
Activity points
422
Can we control precision in floating point arithmetic operations? For example,in a fp multiplier, for some applications I need precision of 2 bits, in some other application precision is not important and in some other I need an 8-bit precision. So that the circuit can reconfigure itself according to the given precision.
 

Floating point it's a fixed 32 bit with with 9 bit exponent and 23 bit mantissa. Address you sir you are not talking about fixed point?
 
  • Like
Reactions: AISU

    AISU

    Points: 2
    Helpful Answer Positive Rating
Floating point it's a fixed 32 bit with with 9 bit exponent and 23 bit mantissa. Address you sir you are not talking about fixed point?
No. Not fixed point. suppose I have two numbers to multiply-1.21354586 and 1.45266654. And I am using a fp multiplier. And the output of the fp multiplier will be of the form of 32-bit binary.1 sign bit, 8-bit exp and 23-bit mantissa.
Here, I need to round the result upto 2-bit as 1.76. Is that possible? I mean is that possible during the normalizing and rounding operations of the fp multiplier?
 

Although you did not specified exactly the application for what it is required, usually there are indirect manners to achieve some control to the resolution of the calculus by customized configuration.

In some compilers for instance, you can define the magnitude of the float point number in terms of decimal digits that represents it, so that this can reduce the core processing. In simulators - for non critical designs - this proceeding allows reduce the overall processing time.
 

Floating point IP from FPGA and tool vendors may support custom floating point formats besides IEEE standard single and double precision. These are however fixed parameters of the core generator.

Generally I don't see a problem to manipulate the result according to your requirements after calculation, or to write your own floating point core.
 

In some compilers for instance, you can define the magnitude of the float point number in terms of decimal digits that represents it, so that this can reduce the core processing. In simulators - for non critical designs - this proceeding allows reduce the overall processing time.
Yes. That's what I am looking for. Something like reduce core processing or reduce processing time by reducing the number of bits for precision. Can you please provide some more details about that?
 

From my personal scope, I can tell that the Altera floating point IP only supports user specified formats between single and double precision, but no reduced size. So if I want something of the questioned kind in Altera, I have to design it myself.
 

Yes. That's what I am looking for. Something like reduce core processing or reduce processing time by reducing the number of bits for precision. Can you please provide some more details about that?

While you do not specify precisely the tool with what you´re working, we cannot do much more.
 

From my personal scope, I can tell that the Altera floating point IP only supports user specified formats between single and double precision, but no reduced size. So if I want something of the questioned kind in Altera, I have to design it myself.

While you do not specify precisely the tool with what you´re working, we cannot do much more.

I am using xilinx ISE and verilog. I am actually trying to write my own code so that I can make it reconfigurable according to the precision requirement.
 

Use parameters for the width of the exponent and mantissa. I would probably just plagiarize the IEEE 754 standard and allow a flexible width instead of the standards explicit bit widths.

If you desire something simpler then the IEEE standard just use an exponent based on 2^-N to 2^N and a fractional mantissa. Also don't complicate things by adding in the "invisible" 1 that resides in an imaginary bit to the left of the MSB of the mantissa that 754 uses.

As this parameterizable format is custom you'll have to create custom parameterizable code for +, -, *, and /. Seems like a lot of work, but the benefit is being able to optimize the amount of logic required for algorithms that don't need a huge amount of precision in their calculations but require large dynamic range. Seems like a useful piece of IP.

Regards
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top