Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronic Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Register Log in

fastest multiplication alghorithms

Zerox100

Full Member level 5
Joined
Mar 1, 2003
Messages
317
Helped
21
Reputation
42
Reaction score
10
Trophy points
1,298
Activity points
2,488
Hi,

What do you think is the fastest multiplication alghorithms in the world?
 

KlausST

Super Moderator
Staff member
Joined
Apr 17, 2014
Messages
17,453
Helped
3,941
Reputation
7,880
Reaction score
3,812
Trophy points
113
Activity points
115,718
Hi,

I vote for a big lookup table.

Klaus
 

FlyingDutch

Full Member level 4
Joined
Dec 16, 2017
Messages
198
Helped
28
Reputation
56
Reaction score
31
Trophy points
28
Location
Bydgoszcz - Poland
Activity points
2,064
Hi,

I vote for a big lookup table.

Klaus
Hello,

i agree with @KlausST. If you are looking for theoretical information see links to these algorithms:

https://en.wikipedia.org/wiki/Karatsuba_algorithm

https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication

https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm

https://en.wikipedia.org/wiki/F%C3%BCrer%27s_algorithm

If you wanna fast multiplication operation using FPGA just use "hardware multipliers" or "DSP blocks" from FPGA fabric.

Regards
 
Last edited:

Zerox100

Full Member level 5
Joined
Mar 1, 2003
Messages
317
Helped
21
Reputation
42
Reaction score
10
Trophy points
1,298
Activity points
2,488
Hello,

i agree with @KlausST. If you are looking for theoretical information see links to these algorithms:

https://en.wikipedia.org/wiki/Karatsuba_algorithm

https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication

https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm

https://en.wikipedia.org/wiki/F%C3%BCrer%27s_algorithm

If you wanna fast multiplication operation using FPGA just use "hardware multipliers" or "DSP blocks" from FPGA fabric.

Regards
I am looking for fast ASIC implementation. Booth looks good.


But I want a 4 step pipelined multiplier.

- - - Updated - - -

Really I want a fast pipelined implementation of 24 bit multiplier for 32 bit floating point multiplication. I prefer four or six stage pipelined multiplier.
 

KlausST

Super Moderator
Staff member
Joined
Apr 17, 2014
Messages
17,453
Helped
3,941
Reputation
7,880
Reaction score
3,812
Trophy points
113
Activity points
115,718

Zerox100

Full Member level 5
Joined
Mar 1, 2003
Messages
317
Helped
21
Reputation
42
Reaction score
10
Trophy points
1,298
Activity points
2,488
Hi,

Do you recognize that you jump from one requirenent to the other.






I recommend to take some time to find a clear requirement.

Klaus
I am so sorry. You are right.

But i was thinking about difference ways to multiply two 32bit floating point. And its the effect of different solutions.

Actually in 32bit floating point multiplication, we should multiply two mantissa and add two exponent. so the bottleneck of speed is multiplication of two mantissa. Initially i was looking for a fast multiplication algorithm. but later i thought about a pipelined multiplication algorithm to increase clock ratio in some sequential independent multiplication.

What is you idea?
 

KlausST

Super Moderator
Staff member
Joined
Apr 17, 2014
Messages
17,453
Helped
3,941
Reputation
7,880
Reaction score
3,812
Trophy points
113
Activity points
115,718
Hi,

I think there is a big difference whether you do 8x8 bit integer multiplication or 32bit x 32bit floating point multiplication.
One is the lower end, the other is the higher end ( or at least good middle range)

the one has a dynamic of 1:256, the other about 1:144700000000000000000000000000000000000000000000000000000000000000000000000
(dont know whether there is a zero too much or too less)
So it´s something totally different.

*********************************

If I understand you right, then you don´t need "fast" code, but code with high "throughput".
(pipelined code is not fast (input to output) but it has high throughput (maybe one calculation per clock cycle))


Klaus
 

Zerox100

Full Member level 5
Joined
Mar 1, 2003
Messages
317
Helped
21
Reputation
42
Reaction score
10
Trophy points
1,298
Activity points
2,488
Hi,

I think there is a big difference whether you do 8x8 bit integer multiplication or 32bit x 32bit floating point multiplication.
One is the lower end, the other is the higher end ( or at least good middle range)

the one has a dynamic of 1:256, the other about 1:144700000000000000000000000000000000000000000000000000000000000000000000000
(dont know whether there is a zero too much or too less)
So it´s something totally different.

*********************************

If I understand you right, then you don´t need "fast" code, but code with high "throughput".
(pipelined code is not fast (input to output) but it has high throughput (maybe one calculation per clock cycle))


Klaus

Hi
Thanks for your attention

You are right. I was initially looking for fast multiplier. But Now i am looking for fast throughput pipelined multiplier.
 

c_mitra

Advanced Member level 5
Joined
Nov 13, 2012
Messages
3,395
Helped
854
Reputation
1,708
Reaction score
807
Trophy points
1,393
Activity points
26,011
I vote for a big lookup table...
For 8 bit unsigned int, you will need a table 256X256 in size. The products shall need 16 bit storage.

For the slowest method (school book way) it is just 8 shifts and 8 additions.
 

ads-ee

Super Moderator
Staff member
Joined
Sep 10, 2013
Messages
7,562
Helped
1,768
Reputation
3,542
Reaction score
1,710
Trophy points
113
Location
USA
Activity points
55,755
For the slowest method (school book way) it is just 8 shifts and 8 additions.
...and every stage of that shift and add can be pipelined to run at very high clock frequencies. Depending on how much pipelining is done you will have increased latency for the first output but you will get all successive outputs on every clock cycle.

Also unlike the RAM this is scalable in an FPGA, i.e. a 32x32 multiply no problem just more pipeline latency. a look up table for 2^32x2^32 bits won't fit in any FPGA.

- - - Updated - - -

One of the methods to reduce the latency of a multiplier is to perform radix lookups for multiple bits of the calculation. This is primarily how most of the hard IP multipliers are designed in processors along with pipelining the multiplier.

That is how the original Pentium multiplication error was introduced due to a miscalculated Radix-4 entry.
 

c_mitra

Advanced Member level 5
Joined
Nov 13, 2012
Messages
3,395
Helped
854
Reputation
1,708
Reaction score
807
Trophy points
1,393
Activity points
26,011
That is how the original Pentium multiplication error was introduced...
The bug was actually associated with the floating point division processor (NDP) - divisions take considerably longer time and they use different lookup tables (if my memory is right). There was no bug in the multiplication lookup table.

Wikipedia has a non-technical article: https://en.wikipedia.org/wiki/Pentium_FDIV_bug
 
  • Like
Reactions: ads-ee

    ads-ee

    points: 2
    Helpful Answer Positive Rating
Toggle Sidebar

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Top