Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

fastest multiplication alghorithms

Status
Not open for further replies.

Zerox100

Full Member level 6
Joined
Mar 1, 2003
Messages
328
Helped
21
Reputation
42
Reaction score
10
Trophy points
1,298
Activity points
2,604
Hi,

What do you think is the fastest multiplication alghorithms in the world?
 

Hi,

I vote for a big lookup table.

Klaus
 

Hi,

I vote for a big lookup table.

Klaus

Hello,

i agree with @KlausST. If you are looking for theoretical information see links to these algorithms:

https://en.wikipedia.org/wiki/Karatsuba_algorithm

https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication

https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm

https://en.wikipedia.org/wiki/F%C3%BCrer%27s_algorithm

If you wanna fast multiplication operation using FPGA just use "hardware multipliers" or "DSP blocks" from FPGA fabric.

Regards
 
Last edited:
Hello,

i agree with @KlausST. If you are looking for theoretical information see links to these algorithms:

https://en.wikipedia.org/wiki/Karatsuba_algorithm

https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication

https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm

https://en.wikipedia.org/wiki/F%C3%BCrer%27s_algorithm

If you wanna fast multiplication operation using FPGA just use "hardware multipliers" or "DSP blocks" from FPGA fabric.

Regards

I am looking for fast ASIC implementation. Booth looks good.


But I want a 4 step pipelined multiplier.

- - - Updated - - -

Really I want a fast pipelined implementation of 24 bit multiplier for 32 bit floating point multiplication. I prefer four or six stage pipelined multiplier.
 


Hi,

Do you recognize that you jump from one requirenent to the other.






I recommend to take some time to find a clear requirement.

Klaus

I am so sorry. You are right.

But i was thinking about difference ways to multiply two 32bit floating point. And its the effect of different solutions.

Actually in 32bit floating point multiplication, we should multiply two mantissa and add two exponent. so the bottleneck of speed is multiplication of two mantissa. Initially i was looking for a fast multiplication algorithm. but later i thought about a pipelined multiplication algorithm to increase clock ratio in some sequential independent multiplication.

What is you idea?
 

Hi,

I think there is a big difference whether you do 8x8 bit integer multiplication or 32bit x 32bit floating point multiplication.
One is the lower end, the other is the higher end ( or at least good middle range)

the one has a dynamic of 1:256, the other about 1:144700000000000000000000000000000000000000000000000000000000000000000000000
(dont know whether there is a zero too much or too less)
So it´s something totally different.

*********************************

If I understand you right, then you don´t need "fast" code, but code with high "throughput".
(pipelined code is not fast (input to output) but it has high throughput (maybe one calculation per clock cycle))


Klaus
 

Hi,

I think there is a big difference whether you do 8x8 bit integer multiplication or 32bit x 32bit floating point multiplication.
One is the lower end, the other is the higher end ( or at least good middle range)

the one has a dynamic of 1:256, the other about 1:144700000000000000000000000000000000000000000000000000000000000000000000000
(dont know whether there is a zero too much or too less)
So it´s something totally different.

*********************************

If I understand you right, then you don´t need "fast" code, but code with high "throughput".
(pipelined code is not fast (input to output) but it has high throughput (maybe one calculation per clock cycle))


Klaus


Hi
Thanks for your attention

You are right. I was initially looking for fast multiplier. But Now i am looking for fast throughput pipelined multiplier.
 

I vote for a big lookup table...

For 8 bit unsigned int, you will need a table 256X256 in size. The products shall need 16 bit storage.

For the slowest method (school book way) it is just 8 shifts and 8 additions.
 

For the slowest method (school book way) it is just 8 shifts and 8 additions.
...and every stage of that shift and add can be pipelined to run at very high clock frequencies. Depending on how much pipelining is done you will have increased latency for the first output but you will get all successive outputs on every clock cycle.

Also unlike the RAM this is scalable in an FPGA, i.e. a 32x32 multiply no problem just more pipeline latency. a look up table for 2^32x2^32 bits won't fit in any FPGA.

- - - Updated - - -

One of the methods to reduce the latency of a multiplier is to perform radix lookups for multiple bits of the calculation. This is primarily how most of the hard IP multipliers are designed in processors along with pipelining the multiplier.

That is how the original Pentium multiplication error was introduced due to a miscalculated Radix-4 entry.
 
That is how the original Pentium multiplication error was introduced...

The bug was actually associated with the floating point division processor (NDP) - divisions take considerably longer time and they use different lookup tables (if my memory is right). There was no bug in the multiplication lookup table.

Wikipedia has a non-technical article: https://en.wikipedia.org/wiki/Pentium_FDIV_bug
 
  • Like
Reactions: ads-ee

    ads-ee

    Points: 2
    Helpful Answer Positive Rating
Status
Not open for further replies.

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top