fastest multiplication alghorithms

Zerox100 · Dec 2, 2019

Hi,

What do you think is the fastest multiplication alghorithms in the world?

FlyingDutch · Dec 2, 2019

Zerox100 said:
Hi,

What do you think is the fastest multiplication alghorithms in the world?

Hello,

what datatype do you mean ?

Regards

Zerox100 · Dec 2, 2019

FlyingDutch said:
Hello,

what datatype do you mean ?

Regards

unsigned 8 bit

KlausST · Dec 2, 2019

Hi,

I vote for a big lookup table.

Klaus

FlyingDutch · Dec 2, 2019

KlausST said:
Hi,

I vote for a big lookup table.

Klaus

Hello,

i agree with @KlausST. If you are looking for theoretical information see links to these algorithms:

https://en.wikipedia.org/wiki/Karatsuba_algorithm

https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication

https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm

https://en.wikipedia.org/wiki/F%C3%BCrer%27s_algorithm

If you wanna fast multiplication operation using FPGA just use "hardware multipliers" or "DSP blocks" from FPGA fabric.

Regards

Zerox100 · Dec 2, 2019

FlyingDutch said:
Hello,

i agree with @KlausST. If you are looking for theoretical information see links to these algorithms:

https://en.wikipedia.org/wiki/Karatsuba_algorithm

https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication

https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm

https://en.wikipedia.org/wiki/F%C3%BCrer%27s_algorithm

If you wanna fast multiplication operation using FPGA just use "hardware multipliers" or "DSP blocks" from FPGA fabric.

Regards

I am looking for fast ASIC implementation. Booth looks good.

But I want a 4 step pipelined multiplier.

- - - Updated - - -

Really I want a fast pipelined implementation of 24 bit multiplier for 32 bit floating point multiplication. I prefer four or six stage pipelined multiplier.

KlausST · Dec 2, 2019

Hi,

Do you recognize that you jump from one requirenent to the other.

unsigned 8 bit

24 bit multiplier

32 bit floating point

I want a 4 step pipelined multiplier.

four or six stage pipelined multiplier.

I recommend to take some time to find a clear requirement.

Klaus

Zerox100 · Dec 3, 2019

KlausST said:
Hi,

Do you recognize that you jump from one requirenent to the other.

I recommend to take some time to find a clear requirement.

Klaus

I am so sorry. You are right.

But i was thinking about difference ways to multiply two 32bit floating point. And its the effect of different solutions.

Actually in 32bit floating point multiplication, we should multiply two mantissa and add two exponent. so the bottleneck of speed is multiplication of two mantissa. Initially i was looking for a fast multiplication algorithm. but later i thought about a pipelined multiplication algorithm to increase clock ratio in some sequential independent multiplication.

What is you idea?

promach · Dec 3, 2019

See my multiplication implementation in verilog

https://github.com/promach/multiply

KlausST · Dec 3, 2019

Hi,

I think there is a big difference whether you do 8x8 bit integer multiplication or 32bit x 32bit floating point multiplication.
One is the lower end, the other is the higher end ( or at least good middle range)

the one has a dynamic of 1:256, the other about 1:144700000000000000000000000000000000000000000000000000000000000000000000000
(dont know whether there is a zero too much or too less)
So it´s something totally different.

*********************************

If I understand you right, then you don´t need "fast" code, but code with high "throughput".
(pipelined code is not fast (input to output) but it has high throughput (maybe one calculation per clock cycle))

Klaus

Zerox100 · Dec 4, 2019

KlausST said:
Hi,

I think there is a big difference whether you do 8x8 bit integer multiplication or 32bit x 32bit floating point multiplication.
One is the lower end, the other is the higher end ( or at least good middle range)

the one has a dynamic of 1:256, the other about 1:144700000000000000000000000000000000000000000000000000000000000000000000000
(dont know whether there is a zero too much or too less)
So it´s something totally different.

*********************************

If I understand you right, then you don´t need "fast" code, but code with high "throughput".
(pipelined code is not fast (input to output) but it has high throughput (maybe one calculation per clock cycle))

Klaus

Hi
Thanks for your attention

You are right. I was initially looking for fast multiplier. But Now i am looking for fast throughput pipelined multiplier.

c_mitra · Dec 4, 2019

I vote for a big lookup table...

For 8 bit unsigned int, you will need a table 256X256 in size. The products shall need 16 bit storage.

For the slowest method (school book way) it is just 8 shifts and 8 additions.

ads-ee · Dec 4, 2019

c_mitra said:
For the slowest method (school book way) it is just 8 shifts and 8 additions.

...and every stage of that shift and add can be pipelined to run at very high clock frequencies. Depending on how much pipelining is done you will have increased latency for the first output but you will get all successive outputs on every clock cycle.

Also unlike the RAM this is scalable in an FPGA, i.e. a 32x32 multiply no problem just more pipeline latency. a look up table for 2^32x2^32 bits won't fit in any FPGA.

- - - Updated - - -

One of the methods to reduce the latency of a multiplier is to perform radix lookups for multiple bits of the calculation. This is primarily how most of the hard IP multipliers are designed in processors along with pipelining the multiplier.

That is how the original Pentium multiplication error was introduced due to a miscalculated Radix-4 entry.

c_mitra · Dec 6, 2019

That is how the original Pentium multiplication error was introduced...

The bug was actually associated with the floating point division processor (NDP) - divisions take considerably longer time and they use different lookup tables (if my memory is right). There was no bug in the multiplication lookup table.

Wikipedia has a non-technical article: https://en.wikipedia.org/wiki/Pentium_FDIV_bug

fastest multiplication alghorithms

Zerox100

Full Member level 6

FlyingDutch

Advanced Member level 1

Zerox100

Full Member level 6

KlausST

Advanced Member level 7

FlyingDutch

Advanced Member level 1

Zerox100

Zerox100

Full Member level 6

KlausST

Advanced Member level 7

Zerox100

Full Member level 6

promach

Advanced Member level 4

KlausST

Advanced Member level 7

Zerox100

Full Member level 6

c_mitra

Advanced Member level 6

ads-ee

Super Moderator

FlyingDutch

c_mitra

Advanced Member level 6

ads-ee

Similar threads

fastest multiplication alghorithms

Full Member level 6

Advanced Member level 1

Full Member level 6

Advanced Member level 7

Advanced Member level 1

Full Member level 6

Advanced Member level 7

Full Member level 6

Advanced Member level 4

Advanced Member level 7

Full Member level 6

Advanced Member level 6

Super Moderator

Advanced Member level 6

Similar threads

Privacy & Transparency

Privacy & Transparency